Python-Assignment

This is working folder for my Python assignment under URI Spring 2018 Bio-539 'Big Data Analysis' class. The task is to count the possible and observed kmers from the file nd2.fasta, compute the linguistic complexity and produce graphs and dataframe for observed vs possible kmers for every sequence of the file.

I subset the nd2.fasta file and made jeeban.fasta taking only 4 sequences from the original file for handling it easily, but the .py script can handle any .fasta files.

There were few non-DNA characters (other than ATGC), I have written a code into the main command file (python_script_Jeeban.py) which counts those characters and produce graphs if that is applicable to a sequence.

If you need any information from this project, you can write me at jeeban_panthi@uri.edu

To run the script: python scrip_name data_file

For my case it is: python python_script_Jeeban.py jeeban.fasta

This script works only for the .fasta file. You can change the code to work with other format of data.

To check the outputs: Go to the output folder, there are two folder into it: one is for the data outputs and the other is for the graphs.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
output		output
README.md		README.md
jeeban.fasta		jeeban.fasta
nd2.fasta		nd2.fasta
python_script_Jeeban.py		python_script_Jeeban.py
test_Jeeban.py		test_Jeeban.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python-Assignment

About

Uh oh!

Releases

Packages

Languages

panthijeeban/Python-Assignment

Folders and files

Latest commit

History

Repository files navigation

Python-Assignment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages