Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database to markdown file #14

Closed
sanketgarade opened this issue Jul 7, 2021 · 21 comments
Closed

database to markdown file #14

sanketgarade opened this issue Jul 7, 2021 · 21 comments
Assignees

Comments

@sanketgarade
Copy link
Contributor

sanketgarade commented Jul 7, 2021

This script is needed to create a single markdown file containing all the words from the database csv file.

The output format for each word and it's Marathi equivalent is given in the example present in the template folder.

@sanketgarade
Copy link
Contributor Author

  • input file will be the db.csv
  • output will be a markdown file which will be used on the github pages website. for now it will the be the home page of the site.
  • for now, a user will have to manual search for a word of interest (or can also use the browser's search function.)

@sanketgarade
Copy link
Contributor Author

prerequisites -

  • have a csv file with content in en and mr columns, at least.
  • have a template markdown file for the output

steps -

  1. read the csv file
  2. create a new markdown file from the template
  3. extract the en and mr words from a row of the csv
  4. fill the extracted words in the markdown file
  5. repeat 3-4 till all rows of csv are done

@zarbod
Copy link
Collaborator

zarbod commented Jul 7, 2021

Do you have the markdown file template ready?

@sanketgarade
Copy link
Contributor Author

Do you have the markdown file template ready?

Yes. It's there in the template folder. Not in a template shape right now but more like an example.

If you want the exact template with placeholders, then I will prepare it later today. But it won't be much different for the example file that is present there currently. It's it suits you, you can begin with that and later update your script once the final template is ready.

The example file shows 3 different ways to arrange the output. Please use the 1st option for now.

@sanketgarade
Copy link
Contributor Author

template is added. pls check the explanation in the readme file in the template folder.

@sanketgarade
Copy link
Contributor Author

I merged part of the PR #13 into main branch. tested ok at my end.
I will close the PR.

There are some enhancements that can be done. I will think and let you know.

@zarbod
Copy link
Collaborator

zarbod commented Jul 8, 2021

Hey, do you have anything for me? I have time to work on the project.

@sanketgarade
Copy link
Contributor Author

sanketgarade commented Jul 8, 2021 via email

@sanketgarade
Copy link
Contributor Author

sanketgarade commented Jul 8, 2021

w.r.t to the existing scripts there are some optimisations that can be done.
lets try to write programs using the unix philosophy. basically these
2 points for now -

  • Write programs that do one thing and do it well.
  • Write programs to work together.

applying this to your sort and gen-md scripts, we can do the following -

  1. sort -
  • make the sorting function universal/generic instead of specific to the input
    csv format.
  • like, determine the number of columns in the csv from the number of elements
    in the top row instead of hardcoded values.
  • when sorting we can pass both the column index to be used for sorting and
    the order of sort as arguments.
  • let the sort function return the output csv file (or its instance) instead
    of saving the file in some location.
  • so that the calling function can decide what to do with the retured file.
  1. generate-markdown
  • lets make one function which outputs exactly one block (the struture in the
    word template.md file)
  • this function will work only on one 1 row of csv text stream and output the
    text stream for 1 block of the output.
  • once again let the caller function pass the csv stream as input and catch
    the output stream as a return value.
  • the idea here is that we can reuse this function in multiple places, where
    we need to generate md file for entire library or for topic based words, or
    for md files split as per word initials etc.

and then write a parent function which calls these and does the needed thing as
per the type of output md file needed.
I am still working on this, that is the types of files we need to create. But
they will be something like -

  • entire library in 1 file
  • 1 file each for 1 alphabet (like A.md will contain all words starting with the
    letter 'a', B.md for 'b' and so on..)
  • 1 file for each topic

@zarbod
Copy link
Collaborator

zarbod commented Jul 8, 2021

w.r.t to the existing scripts there are some optimisations that can be done.
lets try to write programs using the unix philosophy. basically these
2 points for now -

  • Write programs that do one thing and do it well.
  • Write programs to work together.

applying this to your sort and gen-md scripts, we can do the following -

  1. sort -
  • make the sorting function universal/generic instead of specific to the input
    csv format.
  • like, determine the number of columns in the csv from the number of elements
    in the top row instead of hardcoded values.
  • when sorting we can pass both the column index to be used for sorting and
    the order of sort as arguments.
  • let the sort function return the output csv file (or its instance) instead
    of saving the file in some location.
  • so that the calling function can decide what to do with the retured file.
  1. generate-markdown
  • lets make one function which outputs exactly one block (the struture in the
    word template.md file)
  • this function will work only on one 1 row of csv text stream and output the
    text stream for 1 block of the output.
  • once again let the caller function pass the csv stream as input and catch
    the output stream as a return value.
  • the idea here is that we can reuse this function in multiple places, where
    we need to generate md file for entire library or for topic based words, or
    for md files split as per word initials etc.

and then write a parent function which calls these and does the needed thing as
per the type of output md file needed.
I am still working on this, that is the types of files we need to create. But
they will be something like -

  • entire library in 1 file
  • 1 file each for 1 alphabet (like A.md will contain all words starting with the
    letter 'a', B.md for 'b' and so on..)
  • 1 file for each topic

Thanks! I can make the optimizations for the generate-markdown script fairly quickly so I'll do those first.

@sanketgarade
Copy link
Contributor Author

i have hosted the website with some dummy links under the "browse" section. pls have a look at it. you'll get an idea of the type of outputs we need to generate.

@sanketgarade
Copy link
Contributor Author

I have merged pr #18. Thanks!

I will now create a .py file with pseudo code for the parent function to make output md file for entire library and other types of output files (topic, alphabetical etc.). You can then use that to write your code.

@sanketgarade
Copy link
Contributor Author

sanketgarade commented Jul 12, 2021

@zarbod hi, did you see the .py files I added in the src folder and the pseudo code added in those? I have updated part of the db.csv file and would like to create atleast the md output file for all words (the entire library link on the website). Pls let me know when you are planning to implement those scripts. In case any part is not understood, let me know.

@zarbod
Copy link
Collaborator

zarbod commented Jul 12, 2021 via email

@sanketgarade
Copy link
Contributor Author

sanketgarade commented Jul 12, 2021

thanks. please make sure to pull the latest repo first since I made some updates.
also on the website as of now 3 links are having dummy files (entire lib, topics and "a" initial words).
Once your scripts are ready, we can run those on the db.csv file and put the md files containing the actual words from the database onto these links :)

@sanketgarade
Copy link
Contributor Author

@zarbod now that the filter script is done, we could continue with the gen-out and gen-block files so that we can use them together to generate the specific MD files.

Let me know if you can start on these.

@zarbod
Copy link
Collaborator

zarbod commented Jul 16, 2021

I can start working on those in the afternoon.

@sanketgarade
Copy link
Contributor Author

@zarbod
तू यावर काम चालू केलं आहेस का? तुला जर वेळ लागणार असेल तर सांग. तसं असेल तर त्या दरम्यान मी पण माझ्याबाजूने program लिहायला प्रयत्न करून बघतो. मला browse site ची पानं शक्य तितकी लवकर अपलोड करायची आहेत म्हणून.

@zarbod
Copy link
Collaborator

zarbod commented Jul 18, 2021 via email

@sanketgarade
Copy link
Contributor Author

चालेल. 👍🏼

@sanketgarade
Copy link
Contributor Author

Closing this since the basic operation is working fine. Will open separate issues for specific enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants