-
Notifications
You must be signed in to change notification settings - Fork 1
New feature/fwd and rev reads v2 #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ce it wasnt necessary
…put path and directory path of fastq files was identical.
…e 'most probable regions'. Added a collumn to keep track of how many reads are mapped to conserved regions.
…w_Feature/fwd-and-rev-reads
README.md
Outdated
Module: os | ||
Module: subprocess | ||
Module: argparse | ||
Module: itertools | ||
Module: tempfile | ||
Module: shutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea to document the used dependencies, but all listed modules are default python modules that should always be shipped with python installations. Thus, this list is too detailed. If you are going to use non-default libraries like e.g. pandas
or seaborn
we can list them here AND have to add them to the environment.yml
file
README.md
Outdated
To Do List: | ||
-Currently the used programms are refrenced by hardcoded paths | ||
-> not usable on PCs where these programms are saved somewhere else | ||
-Currently only the single most probable region is shown. | ||
-> Modify in a way that reads spanning multiple regions are shown correctly | ||
-Adding other functions such as primer verification or making the programm more efficiant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a to-do list is better realized as one more multiple issues in github
vxdetector/Output_counter.py
Outdated
dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2) | ||
dictionary['V2'] = round((dictionary['V2'] / count) * aligned_count, 2) | ||
dictionary['V3'] = round((dictionary['V3'] / count) * aligned_count, 2) | ||
dictionary['V4'] = round((dictionary['V4'] / count) * aligned_count, 2) | ||
dictionary['V5'] = round((dictionary['V5'] / count) * aligned_count, 2) | ||
dictionary['V6'] = round((dictionary['V6'] / count) * aligned_count, 2) | ||
dictionary['V7'] = round((dictionary['V7'] / count) * aligned_count, 2) | ||
dictionary['V8'] = round((dictionary['V8'] / count) * aligned_count, 2) | ||
dictionary['V9'] = round((dictionary['V9'] / count) * aligned_count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid this type of code duplication by using a loop:
for vregion in ['V1','V2']:
dictionary[vregion] = round((dictionary[vregion] / count) * aligned_count, 2)
vxdetector/Output_counter.py
Outdated
return sum(buf.count(b'\n') for buf in bufgen) | ||
|
||
|
||
def region_count(dictionary, unaligned_count, temp_path, all_reads, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth switching to pandas
instead of operating on dictionaries: https://pandas.pydata.org/
vxdetector/Output_counter.py
Outdated
count = sum([dictionary['V1'], dictionary['V2'], dictionary['V3'], | ||
dictionary['V4'], dictionary['V5'], dictionary['V6'], | ||
dictionary['V7'], dictionary['V8'], dictionary['V9'], no_V]) | ||
dictionary['V1'] = round((dictionary['V1'] / count) * aligned_count, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should differentiate between internal computation (= no need to chomp decimal places) and user report (= no need to show more than 2 digits)
vxdetector/Output_counter.py
Outdated
V1 = str(dictionary['V1']) | ||
V2 = str(dictionary['V2']) | ||
V3 = str(dictionary['V3']) | ||
V4 = str(dictionary['V4']) | ||
V5 = str(dictionary['V5']) | ||
V6 = str(dictionary['V6']) | ||
V7 = str(dictionary['V7']) | ||
V8 = str(dictionary['V8']) | ||
V9 = str(dictionary['V9']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a loop!
vxdetector/Output_counter.py
Outdated
print(f'\n{str(unaligned_count)}% were unaligned.') | ||
print(f'Of all the aligned Reads most were aligned to: {most_probable_V}') | ||
print('The probabilities of all regions is as follows [%]:') | ||
print(f'V1: {V1}\nV2: {V2}\nV3: {V3}\nV4: {V4}\nV5: {V5}\n \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, a loop might be easier: '\n'.join(['V1', 'V2'])
vxdetector/Output_counter.py
Outdated
BED_path = temp_path + 'BED.bed' | ||
Log_path = temp_path + 'bowtie2.log' | ||
def count(temp_path, file_name, file_type, path, dir_name, dir_path, mode): | ||
dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a global constant listing the variable region names. It would than be easy to loop through these regions. Should we add/remove one, we would only have to update in a single location!
vxdetector/files_manager.py
Outdated
"""findet die richtigen ordner/dateien, damit das programm funktioniert""" | ||
programm_path = os.path.dirname(__file__) + '/' | ||
if os.path.exists(programm_path+'Output/'): | ||
# findet die richtigen ordner/dateien, damit das programm funktioniert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please translate to English, should we ever pass this code base to the UCSD guys
|
||
|
||
def buildbowtie2(path): | ||
bowtie2_path = '/vol/software/bin/bowtie2-build' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need a better mechanism to specify binary program locations like a config file or cmd line arguments
…nary and changed the way programs such as bowtie2 are called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, just a few tiny change requests
vxdetector/Output_counter.py
Outdated
import csv | ||
from itertools import (takewhile, repeat) | ||
|
||
dictionary = {'V1': 0, 'V2': 0, 'V3': 0, 'V4': 0, 'V5': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! Can you give the variable a more speaking name, like 'regions' or 'fragments' or ....
vxdetector/Output_counter.py
Outdated
dir_path = dir_name.replace(dir_path, '', 1) | ||
# only leaves the part in the directory tree between | ||
# the file and the original given directory path | ||
if dir_name.split('/')[-1] == '': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that the same as os.path.dirname
or os.path.basename
?
vxdetector/Output_counter.py
Outdated
writer.writeheader() | ||
for key in dictionary: | ||
dictionary[key] = round(dictionary[key], 2) | ||
writer.writerow({'Read-file': file_name, 'Number of Reads': all_reads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you now work with a global variable, we should avoid manual iteration of keys here. How about
foo = {'Read-file': file_name, 'Number of Reads': all_reads, ...}
foo.update(dictionary)
…removed indexed bowtie2 files; renamed global dictionary and create_output no longer iterates through it
region = {'V1_start': '', 'V1_end': '', | ||
'V2_start': '', 'V2_end': '', | ||
'V3_start': '', 'V3_end': '', | ||
'V4_start': '', 'V4_end': '', | ||
'V5_start': '', 'V5_end': '', | ||
'V6_start': '', 'V6_end': '', | ||
'V7_start': '', 'V7_end': '', | ||
'V8_start': '', 'V8_end': '', | ||
'V9_start': '', 'V9_end': ''} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to use a global dictionary as in the main program here as well to enable use of loops!
No description provided.