Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code to parse exported decrypted text file for tags #638

Open
4bob1 opened this issue Jul 24, 2019 · 9 comments

Comments

@4bob1
Copy link

commented Jul 24, 2019

edit: I wrote Matlab code to process the exported JSON file and write the notes to separate files for each tag. The code is in my GitHub repository https://github.com/4bob1/process-standardnotes-archive. If you are interested in modifying the code for other functions, leave a comment on this thread

In issue #366 (open), Caldefredo asked "Is there a way to save the tags within the exported/backup text files?"
Mobitar answered: "The relational text file is just simple JSON, so you could parse that if you wanted to. But otherwise, individual text notes in the zip folder do not contain anything but note contents."

Before I start writing code, can anyone provide the code to parse the relational decrypted text file and output individual text notes that include the note's tags? If so, please provide a link to the code or include it in your answer.

Thanks in advance.

Bob

@4bob1 4bob1 changed the title code to parse exported decrypted text file code to parse exported decrypted text file for tags Jul 24, 2019

@mobitar

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

Probably the best place to look is at our single page offline decryption script, which decrypts and parses the SN relational text file: https://github.com/standardfile/decrypt/blob/master/decrypt.html

@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

Here is the code I wrote and tested on my exported data. It inputs the exported JSON file, modifies the JSON to add tags to each note, and outputs the modified JSON to a new file. It also writes a set of separate files, one for each note, in a specified directory.

The code is written in the matlab language but it should also run on the free, open source, octave program although I haven't tested that.
https://www.gnu.org/software/octave/

I use the JSONlab code that is available on the matlab file exchange.
http://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab-a-toolbox-to-encode-decode-json-files

The functions of this package can decode JSON files into matlab data structures and also encode a matlab data structures into a JSON file. The first section of my code uses the loadjson function from the package to read the standard notes exported Jason file.

The result is a cell array of data structures, the 'items' array, with one cell for each item in the user's standard notes data. A cell is a matlab container that can hold any matlab object. In our case, each cell contains a struct, which is a matlab structure with named fields can hold matlab objects. The fields for the standard notes data are all text strings.

-------------------------------- code ---------

%% load the stdnotes output JSON archive
stdNotesJSONFname = 'L:\StandardNotes\BU24Jan2019.txt';
d = loadjson(stdNotesJSONFname);
items = d.items; % a cell array of the items in the JSON file

See the next response for a continued discussion.

@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

There are different kinds of items in the 'items' array. Of interest to us are the notes and the tag items. The next section of code scans through the array to find the indexes of the notes items 'idx2Notes' and the tag items 'idx2Tags'. The code also extracts a cell array of the unique ids of all the items, 'uuids'. It turns out that the array also includes trashed notes, so it identifies those thru 'idx2Trashed' although we don't use them in the rest of the code.

------code-----------

%% scan items array to find the note and tag items. get uuids for each item
idx2Notes = []; % indexes into items for the notes data
idx2Tags = [];% indexes into items for the tag data
idx2Trashed = [];% indexes into items for the trashed notes
uuids = {}; % accumulate all uuids here
nItems = length(items);
for ki = 1:nItems
    item = items{ki};
    uuids = [uuids; item.uuid];
    type = item.content_type;
    if strcmpi(type,'note')
        content = item.content;
          % skip trashed notes items
        if isfield(content,'trashed')
            fprintf('idx %d note trashed %d\n',ki,content.trashed);
            idx2Trashed = [idx2Trashed; ki];
            continue % go to next iteration of loop
        end
        idx2Notes = [idx2Notes; ki];
    elseif strcmpi(type,'tag')
        idx2Tags = [idx2Tags; ki];
    end
end

continued in next comment

@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

The next section of code goes through all the tag items. For each tag item it uses the references member, which contains the unique IDs of all the notes with that tag. It searches the uuids array to find the index of the pointed-to note. It modifies the item's content member to add a tags field if none exists or to append the tag name, which is the title of the tag, to the field.
---------code---------------

%% go through tag items to add the tag title as a tag for each tag reference to a note
for idx2tag = idx2Tags(:)'
    item = items{idx2tag}; % the tag item struct
    assert(strcmpi(item.content_type,'tag')); % make sure got tag data
    refs = item.content.references; % these are refs to notes with this tag
    nrefs = length(refs);
    for kref = 1:nrefs
          % for each reference add the tag title to the item.content.tags struct field
        uuid = refs{kref}.uuid;
        idx2note = find(ismember(uuids,uuid)); % find the idx of note with the uuid
        assert(numel(idx2note)==1); % uuid is unique so only one match
        item4Note = items{idx2note}; % the selected note item struct
        assert(strcmpi(item4Note.content_type,'note')); % should be a note
        if isfield(item4Note.content,'tags')
              % if note already has tags field, append the current tag
            item4Note.content.tags = sprintf('%s,%s',item4Note.content.tags,item.content.title);
        else
            item4Note.content.tags = item.content.title; % make tags field with current tag
        end
        items{idx2note} = item4Note; % overwrite the old item with new one with tags field
    end 
end
@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

Next the code uses the savejson function from the JSONlab package to save the modified items array to a new file.

-------code-------------

%% save the updated items cell array to a new JSON file
clear dnew % just to be safe
dnew.items = items;
newJSONFname = 'L:\StandardNotes\BU24Jan2019WithTags.txt';
json = savejson('',dnew,newJSONFname);
@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

Finally, the code saves the notes as taxt files in a specified directory. The names of the files are the note titles so there are several regular expression replaces to make the title into a legal file name.

I intend to modify the code to output a file that can be input to InfoSelect, which I have used for many years as my desktop computer personal information manager.

--------code----------

%% save the notes as text files with tags as first line
dir4Notes = 'L:\StandardNotes\NotesTmp\';
if exist(dir4Notes,'dir')
    rmdir(dir4Notes,'s'); % delete any prev version and its contents
end
mkdir(dir4Notes);
fnameStripRegExp = '[ ]+|[-]+'; % regular expression for symbols to delete from file name
maxTitleLength = 30;
for kn = idx2Notes(:)'
    item = items{kn};
    title = item.content.title;
    if numel(title)>maxTitleLength % clip long titles
        title = title(1:maxTitleLength);
    end
    fname = [dir4Notes title '.txt'];
    fname = regexprep(fname,fnameStripRegExp,'_');
      % get rid of more illegal symbols from file name
    fname = regexprep(fname,'\?','_'); % question mark
    fname = regexprep(fname,'\/','_'); % forward slash
    fname = regexprep(fname,'\"','_'); % quotation mark
    fp = fopen(fname,'w'); % open for writing
    assert(fp>0); % no error on open
    WriteStdNote(fp,item); % write out the item including tags to a file
    fclose(fp);
end
@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 26, 2019

The code calls the WriteStdNote function to acually write out the note text:

function [nbytes] = WriteStdNote(fp, item)
% function [nbytes] = WriteStdNote(fp, item)
% write the tags (if any) and note text to a file
% inputs:
%     fp: a fileid from fopen
%     item: an item struct from the exported JSON file
% outputs:
%     nbytes: the number of bytes from the last fprintf write

content = item.content;
if isfield(content,'tags')
    % if tags field exists, write them out
    fprintf(fp,'tags:');
    fprintf(fp,' %s',content.tags);
    fprintf(fp,'\n');
end
nbytes = fprintf(fp,content.text);
@mobitar

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

Very cool! Not familiar with matlab but is it possible to create a reusable package from the code above?

@4bob1

This comment has been minimized.

Copy link
Author

commented Jul 30, 2019

Thanks. See the first comment. If there is any interest in a reusable package, I will look into converting to a standalone program or other application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.