Skip to content

Sessions 3 and 4: Practicing with the tools

Juan Gonzalez-Gomez edited this page Feb 3, 2019 · 36 revisions

Sessions 3 and 4. Week 2: Tools

  • Goals:
    • Brief introduction to the Ensembl genome browser
    • Practicing with github and pycharm
  • Time: 4h
  • Dates: Week 2: Tuesday, Jan-29th. Wednesday, Jan-30th

Contents

Practice 0

During the previous sesions (1 and 2) we learn how to use github and pycharm. We were pricticing with simple python programs. During the next two sessions (3 and 4) you will have time to finish the exercises in the Laboratory and to organize your files

  • Goals:

    • Finish the exercises proposed in sessions 1 and 2
    • You should place all the files in the P0 folder in your 2018-19-PNE-practices github repository
  • Recap of files you should have in your P0 Folder

Python script name Description
hello.py Hello world example. Just print some strings
print_numbers.py Printing the numbers from 1 to 20, one number per line
sum_100.py Adding the numbers from 1 to 100 and printing the result
sum_n.py Adding the numbers from 1 to n, where n is a parameter defined in the code. It should be implemented by calling a function that perform the addition of the n first integers numbers (1+2+3+...+n)
fibonacci.py It prints the first n terms of the fibonacci series. The n parameter is a constant inside de code
fibonacci_sum.py Adding the fibonacci terms from 1 to n. The n parameter should be enter by the user
dna_count.py Count the number of bases in a DNA sequence. The sequence is input by the user
dna_count_file.py The same as dna_count, but the sequence is read from the dna.txt file

This is how your github 2018-19-PNE repo should look like:

Inside the P0 folder you should have the files of the Practice 0

Introduction to the Ensemble web database

ensembl.org is one of several well known genome browsers for the retrieval of genomic information. It can be directly used from the web page. But, It can also be used by accessing directly to its databases by means of a Rest API

You can find more information in the Ensambl wikipedia page

As BIO-specialists, you should be able to use the databases and understand the underlying concepts: genome, DNA, bases and so on. Concepts that computer scientists and Electronic Engineers are not familiar with

As Engineers, you should understand the technology behind these databases: the networks, the internet architecture, the protocols, the formats... and how to create applications that can access this databases. This is what this subject: the Programming in Network Environments, is about

Formats

The information located in the databases is available in different formats. The most important are:

  • FASTA: for representing either nucleotide sequences or peptide sequences. This is a format used in Bioinformatics applications
  • JSON: Generic format for representing any data object
  • XML: is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable

As Engineers we should understand that formats, and to be able to create programs for reading data in those formats, and to write information with them

Activities

Just to get familiar with the Ensembl web page, Let's play a little bit:

  • Enter into the ensembl.org web page and have a look (5 minutes)
  • Go to the Chromosome page in wikipedia. Find the link to the human chromosome 10 and enter into its page
  • Have a look to the right. can you find the link of the Chromosome 10 in Ensamb? Enter into it and play (5 minutos)
  • From the chromosome 10 Wikipedia page: find the FRAT1 gene. Enter into the page
  • FRAT1 page: Can you find the link to the Ensambl database? Enter into it
  • Let's download the data of the FRAT1 gene. Click on the Export data (on the left)
  • Then click on next. Select the text format (for example)
  • You will see the FRAT1 gene in FASTA format in the browser
  • LEt's save that information into a file. Go back and right click on the text link. Click on save link as. Save the file as FRAT1.txt
  • Go to your computer local file manager and open the file
  • Congrats! You've downloaded your fist gene sequence! :-)

Practice 0-extra (Optional)

If you have finished the practice 0, and you want to learn more, we are proposing you these two extra exercises that are optional

  • You should store them in your repository, inside the P0-extra folder (at the same level than the P0 folder)

Extra exercise 1

  • Download the file corresponding to the CPLX2 gene located in the human chromosome number 5 (name it as CPLX2.txt)
  • Make a python program (CPLX2_print.py) that opens that file and print all the data contained in it
  • Add the two files CPLX2.txt and CPLX2_print.py to your github repo, in the P0-extra folder

Extra exercise 2

  • Create a python program that opens the CPLX2.txt file and count the number of bases: A,C,T and G. You should ignore the lines that start with the ">" character (these lines contain information about the sequence, but they are not part of the gene). Name the program as CPLX2_count.py
  • Upload the program into your github account, in the P0-extra folder

Authors

Credits

  • Alvaro del Castillo. He designed and created the original content of this subject. Thanks a lot :-)

License

Links

Clone this wiki locally