New feature/fwd and rev reads v2 #6

JGroos-16 · 2022-07-06T13:51:01Z

No description provided.

…ce it wasnt necessary

…put path and directory path of fastq files was identical.

…e 'most probable regions'. Added a collumn to keep track of how many reads are mapped to conserved regions.

…w_Feature/fwd-and-rev-reads

sjanssen2 · 2022-07-06T14:29:19Z

README.md

    Module: os
    Module: subprocess
    Module: argparse
+    Module: itertools
    Module: tempfile
    Module: shutil


I like the idea to document the used dependencies, but all listed modules are default python modules that should always be shipped with python installations. Thus, this list is too detailed. If you are going to use non-default libraries like e.g. pandas or seaborn we can list them here AND have to add them to the environment.yml file

sjanssen2 · 2022-07-06T14:29:57Z

README.md

    To Do List:
        -Currently the used programms are refrenced by hardcoded paths
            -> not usable on PCs where these programms are saved somewhere else
-        -Currently only the single most probable region is shown. 
-            -> Modify in a way that reads spanning multiple regions are shown correctly
        -Adding other functions such as primer verification or making the programm more efficiant


a to-do list is better realized as one more multiple issues in github

sjanssen2 · 2022-07-06T14:32:13Z

vxdetector/Output_counter.py

+    dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2)
+    dictionary['V2'] = round((dictionary['V2'] / count) * aligned_count, 2)
+    dictionary['V3'] = round((dictionary['V3'] / count) * aligned_count, 2)
+    dictionary['V4'] = round((dictionary['V4'] / count) * aligned_count, 2)
+    dictionary['V5'] = round((dictionary['V5'] / count) * aligned_count, 2)
+    dictionary['V6'] = round((dictionary['V6'] / count) * aligned_count, 2)
+    dictionary['V7'] = round((dictionary['V7'] / count) * aligned_count, 2)
+    dictionary['V8'] = round((dictionary['V8'] / count) * aligned_count, 2)
+    dictionary['V9'] = round((dictionary['V9'] / count) * aligned_count, 2)


avoid this type of code duplication by using a loop:

for vregion in ['V1','V2']: dictionary[vregion] = round((dictionary[vregion] / count) * aligned_count, 2)

sjanssen2 · 2022-07-06T14:33:40Z

vxdetector/Output_counter.py

+    return sum(buf.count(b'\n') for buf in bufgen)
+
+
+def region_count(dictionary, unaligned_count, temp_path, all_reads, mode):


it might be worth switching to pandas instead of operating on dictionaries: https://pandas.pydata.org/

sjanssen2 · 2022-07-06T14:34:34Z

vxdetector/Output_counter.py

+    count = sum([dictionary['V1'], dictionary['V2'], dictionary['V3'],
+                dictionary['V4'], dictionary['V5'], dictionary['V6'],
+                dictionary['V7'], dictionary['V8'], dictionary['V9'], no_V])
+    dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2)


we should differentiate between internal computation (= no need to chomp decimal places) and user report (= no need to show more than 2 digits)

sjanssen2 · 2022-07-06T14:35:58Z

vxdetector/Output_counter.py

+    V1 = str(dictionary['V1'])
+    V2 = str(dictionary['V2'])
+    V3 = str(dictionary['V3'])
+    V4 = str(dictionary['V4'])
+    V5 = str(dictionary['V5'])
+    V6 = str(dictionary['V6'])
+    V7 = str(dictionary['V7'])
+    V8 = str(dictionary['V8'])
+    V9 = str(dictionary['V9'])


use a loop!

sjanssen2 · 2022-07-06T14:36:51Z

vxdetector/Output_counter.py

+    print(f'\n{str(unaligned_count)}% were unaligned.')
+    print(f'Of all the aligned Reads most were aligned to: {most_probable_V}')
+    print('The probabilities of all regions is as follows [%]:')
+    print(f'V1: {V1}\nV2: {V2}\nV3: {V3}\nV4: {V4}\nV5: {V5}\n \


again, a loop might be easier: '\n'.join(['V1', 'V2'])

sjanssen2 · 2022-07-06T14:38:16Z

vxdetector/Output_counter.py

-    BED_path = temp_path + 'BED.bed'
-    Log_path = temp_path + 'bowtie2.log'
+def count(temp_path, file_name, file_type, path, dir_name, dir_path, mode):
+    dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0,


we should have a global constant listing the variable region names. It would than be easy to loop through these regions. Should we add/remove one, we would only have to update in a single location!

sjanssen2 · 2022-07-06T14:39:24Z

vxdetector/files_manager.py

-    """findet die richtigen ordner/dateien, damit das programm funktioniert"""
-    programm_path = os.path.dirname(__file__) + '/'
-    if os.path.exists(programm_path+'Output/'):
+    # findet die richtigen ordner/dateien, damit das programm funktioniert


could you please translate to English, should we ever pass this code base to the UCSD guys

sjanssen2 · 2022-07-06T14:40:24Z

vxdetector/interact_bowtie2.py



 def buildbowtie2(path):
-    bowtie2_path = '/vol/software/bin/bowtie2-build'


we will need a better mechanism to specify binary program locations like a config file or cmd line arguments

…nary and changed the way programs such as bowtie2 are called.

sjanssen2

looks good to me, just a few tiny change requests

sjanssen2 · 2022-07-08T09:18:16Z

vxdetector/Output_counter.py

 import csv
 from itertools import (takewhile, repeat)

+dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0,


great! Can you give the variable a more speaking name, like 'regions' or 'fragments' or ....

sjanssen2 · 2022-07-08T09:19:04Z

vxdetector/Output_counter.py

    dir_path = dir_name.replace(dir_path, '', 1)
+    # only leaves the part in the directory tree between
+    # the file and the original given directory path
    if dir_name.split('/')[-1] == '':


isn't that the same as os.path.dirname or os.path.basename?

sjanssen2 · 2022-07-08T09:21:57Z

vxdetector/Output_counter.py

            writer.writeheader()
+        for key in dictionary:
+            dictionary[key] = round(dictionary[key], 2)
        writer.writerow({'Read-file': file_name, 'Number of Reads': all_reads,


since you now work with a global variable, we should avoid manual iteration of keys here. How about

foo = {'Read-file': file_name, 'Number of Reads': all_reads, ...} foo.update(dictionary)

…removed indexed bowtie2 files; renamed global dictionary and create_output no longer iterates through it

sjanssen2 · 2022-07-12T06:12:40Z

Indexed_bt2/code_for_reference/create_annoted_ref.py

+            region = {'V1_start': '', 'V1_end': '',
+                      'V2_start': '', 'V2_end': '',
+                      'V3_start': '', 'V3_end': '',
+                      'V4_start': '', 'V4_end': '',
+                      'V5_start': '', 'V5_end': '',
+                      'V6_start': '', 'V6_end': '',
+                      'V7_start': '', 'V7_end': '',
+                      'V8_start': '', 'V8_end': '',
+                      'V9_start': '', 'V9_end': ''}


you might want to use a global dictionary as in the main program here as well to enable use of loops!

Johannes Groos and others added 7 commits June 29, 2022 14:51

added the option to align paired reads. Removed interact_samtools sin…

9ba58ce

…ce it wasnt necessary

fixed an issue in Output_counter where no .csv file was created if in…

1a24b11

…put path and directory path of fastq files was identical.

added an unpaired counter. Also added some lines to allow for multipl…

e46a260

…e 'most probable regions'. Added a collumn to keep track of how many reads are mapped to conserved regions.

fixed some issues and improved output

37782e2

temporaere sam file erstellung in kommentare gepackt

2ea3c9d

Merge branch 'master' of github.com:jlab/algorithm_vxdetector into Ne…

6cf18b5

…w_Feature/fwd-and-rev-reads

Style changes

5b690ae

JGroos-16 requested a review from sjanssen2 July 6, 2022 13:54

sjanssen2 requested changes Jul 6, 2022

View reviewed changes

added comments, dictionary now global variable, looped through dictio…

30dec78

…nary and changed the way programs such as bowtie2 are called.

sjanssen2 requested changes Jul 8, 2022

View reviewed changes

Added the programm which looks for boundary regions in otus_aligned; …

d67cf91

…removed indexed bowtie2 files; renamed global dictionary and create_output no longer iterates through it

JGroos-16 requested a review from sjanssen2 July 11, 2022 14:44

sjanssen2 approved these changes Jul 12, 2022

View reviewed changes

JGroos-16 merged commit e22decf into master Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New feature/fwd and rev reads v2 #6

New feature/fwd and rev reads v2 #6

Uh oh!

JGroos-16 commented Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 Jul 6, 2022

Uh oh!

sjanssen2 left a comment

Uh oh!

sjanssen2 Jul 8, 2022

Uh oh!

sjanssen2 Jul 8, 2022

Uh oh!

sjanssen2 Jul 8, 2022

Uh oh!

sjanssen2 Jul 12, 2022

Uh oh!

Uh oh!

		return sum(buf.count(b'\n') for buf in bufgen)


		def region_count(dictionary, unaligned_count, temp_path, all_reads, mode):



		def buildbowtie2(path):
		bowtie2_path = '/vol/software/bin/bowtie2-build'

New feature/fwd and rev reads v2 #6

New feature/fwd and rev reads v2 #6

Uh oh!

Conversation

JGroos-16 commented Jul 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjanssen2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!