Fourth version #33

anamika-yadav99 · 2022-08-15T11:55:56Z

Addresses issue #30 and issue #32

…an_data_refinement into third_version

…_refinement into fourth_version

…ian_data_refinement into fourth_version

manulera · 2022-08-15T17:36:02Z

Hi Anamika,

I haven't had a look at the code yet, but you should invest more time in the documentation. Your readme does not address the points that I raised in our previous email exchange:

You should explain what your pipeline does so that someone that comes to the repo for the first time understands it. This should cover:

How/where are allele features defined.

What do the output files look like, with some examples, e.g. take an example genotype and explain how the output in each file will be. Not addressed

How to download the fluorescent protein data (this could go in the readme section Getting the data). Not addressed

Also, the issue mentioned that there should be:

Step-by-step guide on how to run the pipeline, taking the public available strains in nbrp_strains. Should be improved

Instructions to write your own format.py, with some example. Should be improved

You should start the documentation of the pipeline by briefly explaining:

What is the starting data (lists of genotypes)
What information we want to extract from them
What is the extra input files that we use and what do they contain (what types of allele features, how they are defined in the toml files)
What does the output of a particular example line in strains.tsv look in:
- All the output files of build nltk_tags
- All the output files of summary_nltg_tags

Some other things I noticed so far:

summary_nltg_tags should be renamed to summary_nltk_tags
The tests should be improved. When testing, you should use a shorter test_strains.tsv, but verify that all generated files are correct, the ones produced by build_nltk_tags and summary_ntlk tags.

anamika-yadav99 and others added 30 commits July 12, 2022 19:01

cords included in alleles.json

05e8e4d

coords

506f797

sorted by coordinates

755932e

test_third_version_py

635e899

test allele coordinates are sorted

428e202

bug fix

78c316e

WIP

129327e

tsv file unidentified occurence

472dbec

minor change

48aaf36

Merge branch 'third_version' of https://github.com/manulera/genestori…

54e506e

…an_data_refinement into third_version

small changes by manu

239990f

minor changes

9b2e302

Merge branch 'third_version' of https://github.com/manulera/genestori…

3d000a8

…an_data_refinement into third_version

fourth_version and format.py

e720561

remove extra folder

d6e1818

manu's changes

8ccac7a

Merge branch 'master' of https://github.com/manulera/genestorian_data…

67ba177

…_refinement into fourth_version

Merge branch 'fourth_version' of https://github.com/manulera/genestor…

15fe893

…ian_data_refinement into fourth_version

code for fpbase data

1d2f543

fixed occurrence spellings

58b726e

added test_occurrence.json

bd22c25

tests for 4th version

3d4b8b6

readme updated

2b04edb

updates

b0c1064

summary_pattern.py

a1c72ed

changes zoom

e2cfd50

nbrp strains

6eb72f5

4th version completed

ef513bf

test file renamed

a1451a1

minor change

c5808f8

anamika-yadav99 and others added 17 commits August 18, 2022 15:35

readme WIP test- Completed

2fb4264

readme

8e508ca

minor change

7993a33

formatting fixed

a5bc463

minor fixes

2bbef9e

..

0b4b85a

add the fpbase toml file to gitignore

42f12af

switched summary_pattern code to script in module

a08ba57

remove return None, not needed

a961489

simplified count_most_common_other_tag

512fa29

cleanup

cf4d1f9

comments on readme added

2b652ed

test cases

3b182e7

test file tsv

c6ebae5

more changes to readme

25a3cea

format of pattern changed

70df015

readme updated

a83dae3

manulera approved these changes Aug 23, 2022

View reviewed changes

manulera merged commit f35eb72 into master Aug 23, 2022

manulera deleted the fourth_version branch August 23, 2022 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fourth version #33

Fourth version #33

anamika-yadav99 commented Aug 15, 2022

manulera commented Aug 15, 2022

Fourth version #33

Fourth version #33

Conversation

anamika-yadav99 commented Aug 15, 2022

manulera commented Aug 15, 2022