Skip to content

Conversation

JGroos-16
Copy link
Contributor

No description provided.

@JGroos-16 JGroos-16 requested a review from sjanssen2 July 6, 2022 13:54
README.md Outdated
Comment on lines 20 to 25
Module: os
Module: subprocess
Module: argparse
Module: itertools
Module: tempfile
Module: shutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to document the used dependencies, but all listed modules are default python modules that should always be shipped with python installations. Thus, this list is too detailed. If you are going to use non-default libraries like e.g. pandas or seaborn we can list them here AND have to add them to the environment.yml file

README.md Outdated
Comment on lines 27 to 30
To Do List:
-Currently the used programms are refrenced by hardcoded paths
-> not usable on PCs where these programms are saved somewhere else
-Currently only the single most probable region is shown.
-> Modify in a way that reads spanning multiple regions are shown correctly
-Adding other functions such as primer verification or making the programm more efficiant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a to-do list is better realized as one more multiple issues in github

Comment on lines 43 to 51
dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2)
dictionary['V2'] = round((dictionary['V2'] / count) * aligned_count, 2)
dictionary['V3'] = round((dictionary['V3'] / count) * aligned_count, 2)
dictionary['V4'] = round((dictionary['V4'] / count) * aligned_count, 2)
dictionary['V5'] = round((dictionary['V5'] / count) * aligned_count, 2)
dictionary['V6'] = round((dictionary['V6'] / count) * aligned_count, 2)
dictionary['V7'] = round((dictionary['V7'] / count) * aligned_count, 2)
dictionary['V8'] = round((dictionary['V8'] / count) * aligned_count, 2)
dictionary['V9'] = round((dictionary['V9'] / count) * aligned_count, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid this type of code duplication by using a loop:

for vregion in ['V1','V2']:
    dictionary[vregion] = round((dictionary[vregion] / count) * aligned_count, 2)

return sum(buf.count(b'\n') for buf in bufgen)


def region_count(dictionary, unaligned_count, temp_path, all_reads, mode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth switching to pandas instead of operating on dictionaries: https://pandas.pydata.org/

count = sum([dictionary['V1'], dictionary['V2'], dictionary['V3'],
dictionary['V4'], dictionary['V5'], dictionary['V6'],
dictionary['V7'], dictionary['V8'], dictionary['V9'], no_V])
dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should differentiate between internal computation (= no need to chomp decimal places) and user report (= no need to show more than 2 digits)

Comment on lines 105 to 113
V1 = str(dictionary['V1'])
V2 = str(dictionary['V2'])
V3 = str(dictionary['V3'])
V4 = str(dictionary['V4'])
V5 = str(dictionary['V5'])
V6 = str(dictionary['V6'])
V7 = str(dictionary['V7'])
V8 = str(dictionary['V8'])
V9 = str(dictionary['V9'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a loop!

print(f'\n{str(unaligned_count)}% were unaligned.')
print(f'Of all the aligned Reads most were aligned to: {most_probable_V}')
print('The probabilities of all regions is as follows [%]:')
print(f'V1: {V1}\nV2: {V2}\nV3: {V3}\nV4: {V4}\nV5: {V5}\n \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, a loop might be easier: '\n'.join(['V1', 'V2'])

BED_path = temp_path + 'BED.bed'
Log_path = temp_path + 'bowtie2.log'
def count(temp_path, file_name, file_type, path, dir_name, dir_path, mode):
dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a global constant listing the variable region names. It would than be easy to loop through these regions. Should we add/remove one, we would only have to update in a single location!

"""findet die richtigen ordner/dateien, damit das programm funktioniert"""
programm_path = os.path.dirname(__file__) + '/'
if os.path.exists(programm_path+'Output/'):
# findet die richtigen ordner/dateien, damit das programm funktioniert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please translate to English, should we ever pass this code base to the UCSD guys



def buildbowtie2(path):
bowtie2_path = '/vol/software/bin/bowtie2-build'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need a better mechanism to specify binary program locations like a config file or cmd line arguments

…nary and changed the way programs such as bowtie2 are called.
Copy link
Member

@sjanssen2 sjanssen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, just a few tiny change requests

import csv
from itertools import (takewhile, repeat)

dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! Can you give the variable a more speaking name, like 'regions' or 'fragments' or ....

dir_path = dir_name.replace(dir_path, '', 1)
# only leaves the part in the directory tree between
# the file and the original given directory path
if dir_name.split('/')[-1] == '':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the same as os.path.dirname or os.path.basename?

writer.writeheader()
for key in dictionary:
dictionary[key] = round(dictionary[key], 2)
writer.writerow({'Read-file': file_name, 'Number of Reads': all_reads,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you now work with a global variable, we should avoid manual iteration of keys here. How about

foo = {'Read-file': file_name, 'Number of Reads': all_reads, ...}
foo.update(dictionary)

…removed indexed bowtie2 files; renamed global dictionary and create_output no longer iterates through it
@JGroos-16 JGroos-16 requested a review from sjanssen2 July 11, 2022 14:44
Comment on lines +47 to +55
region = {'V1_start': '', 'V1_end': '',
'V2_start': '', 'V2_end': '',
'V3_start': '', 'V3_end': '',
'V4_start': '', 'V4_end': '',
'V5_start': '', 'V5_end': '',
'V6_start': '', 'V6_end': '',
'V7_start': '', 'V7_end': '',
'V8_start': '', 'V8_end': '',
'V9_start': '', 'V9_end': ''}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to use a global dictionary as in the main program here as well to enable use of loops!

@JGroos-16 JGroos-16 merged commit e22decf into master Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants