Skip to content

Commit 0513d89

Browse files
Initial commit
0 parents  commit 0513d89

File tree

6 files changed

+772
-0
lines changed

6 files changed

+772
-0
lines changed

.github/workflows/classroom.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Autograding Tests
2+
'on':
3+
- push
4+
- repository_dispatch
5+
permissions:
6+
checks: write
7+
actions: read
8+
contents: read
9+
jobs:
10+
run-autograding-tests:
11+
runs-on: ubuntu-latest
12+
if: github.actor != 'github-classroom[bot]'
13+
steps:
14+
- name: Checkout code
15+
uses: actions/checkout@v4
16+
- name: ".tests/orf_test.py"
17+
id: tests-orf-test-py
18+
uses: classroom-resources/autograding-python-grader@v1
19+
with:
20+
timeout: 10
21+
max-score: 10
22+
setup-command: ''
23+
- name: Autograding Reporter
24+
uses: classroom-resources/autograding-grading-reporter@v1
25+
env:
26+
TESTS-ORF-TEST-PY_RESULTS: "${{steps.tests-orf-test-py.outputs.result}}"
27+
with:
28+
runners: tests-orf-test-py

.tests/orf_test.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
2+
from orf import find_all_starts, find_first_in_register_stop, all_orfs_range, longest_orf
3+
4+
def test_find_all_starts():
5+
assert find_all_starts("") == list()
6+
assert find_all_starts("GGAGACGACGCAAAAC") == list()
7+
assert find_all_starts("AAAAAAATGAAATGAGGGGGGTATG") == [6, 11, 22]
8+
assert find_all_starts("GGATGATGATGTAAAAC") == [2, 5, 8]
9+
assert find_all_starts("GGATGCATGATGTAGAAC") == [2, 6, 9]
10+
assert find_all_starts("GGGATGATGATGGGATGGTGAGTAGGGTAAG") == [3, 6, 9, 14]
11+
assert find_all_starts("GGGatgatgatgGGatgGtgaGtagGGACtaaG".upper()) == [3, 6, 9, 14]
12+
13+
def test_find_first_in_register_stop():
14+
assert find_first_in_register_stop("") == -1
15+
assert find_first_in_register_stop("GTAATAGTGA") == -1
16+
assert find_first_in_register_stop("AAAAAAAAAAAAAAATAAGGGTAA") == 18
17+
assert find_first_in_register_stop("AAAAAACACCGCGTGTACTGA") == 21
18+
19+
def test_all_orfs():
20+
assert all_orfs_range("") == list()
21+
assert all_orfs_range("GGAGACGACGCAAAAC") == list()
22+
assert all_orfs_range("AAAAAAATGAAATGAGGGGGGTATG") == [(6, 15)]
23+
assert all_orfs_range("GGATGATGATGTAAAAC") == [(2, 14),(5, 14),(8,14)]
24+
assert all_orfs_range("GGATGCATGATGTAGAAC") == [(6, 15), (9, 15)]
25+
assert all_orfs_range("GGGATGATGATGGGATGGTGAGTAGGGTAAG") == [(3, 21),(6, 21), (9, 21)]
26+
assert all_orfs_range("GGGatgatgatgGGatgGtgaGtagGGACtaaG".upper()) == [(3, 21), (6, 21), (9, 21), (14, 32)]
27+
28+
def test_longest_orf():
29+
assert longest_orf("") == ""
30+
assert longest_orf("GGAGACGACGCAAAAC") == ""
31+
assert longest_orf("AAAAAAATGAAATGAGGGGGGTATG") == "ATGAAATGA"
32+
assert longest_orf("GGATGATGATGTAAAAC") == "ATGATGATGTAA"
33+
assert longest_orf("GGATGCATGATGTAGAAC") == "ATGATGTAG"
34+
assert longest_orf("GGGATGATGATGGGATGGTGAGTAGGGTAAG") == "ATGATGATGGGATGGTGA"
35+
assert longest_orf("GGGatgatgatgGGatgGtgaGtagGGACtaaG") in ["atgGtgaGtagGGACtaa","atgatgatgGGatgGtga"]

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# DNA Sequence Analysis Toolkit
2+
3+
This toolkit provides a series of functions for analyzing DNA sequences to identify codon patterns, open reading frames (ORFs), and the longest ORF within a sequence.
4+
5+
## Overview
6+
7+
The project is divided into the following steps:
8+
9+
1. **Finding all start codons**: Locate all occurrences of the `ATG` start codon.
10+
2. **Finding in-register stop codons**: Identify the first stop codon (`TGA`, `TAG`, or `TAA`) in-frame.
11+
3. **Identifying all ORFs**: Determine the ranges of all open reading frames in the sequence.
12+
4. **Finding the longest ORF**: Extract the longest open reading frame from the sequence.
13+
14+
---
15+
16+
## Steps and Implementation Details
17+
18+
### Step 1: Implementing the `find_all_starts` Function
19+
This function scans a DNA sequence and identifies all starting positions of the `ATG` codon.
20+
21+
### Step 2: Implementing the `find_first_in_register_stop` Function
22+
You will write a function called find_first_in_register_stop that scans a DNA sequence and identifies the first occurrence of a stop codon (`TGA`, `TAG`, or `TAA`) in the register (every 3 nucleotides). If the stop codon is not found, the function will return -1.
23+
24+
### Step 3: Implementing the `all_orfs_range` Function
25+
You will implement a function called all_orfs_range that scans a DNA sequence and identifies all open reading frames (ORFs) by finding the start codons (`ATG`) and the corresponding stop codons (`TGA`, `TAG`, `TAA`), and returns the range of each ORF.
26+
27+
### Step 4: Implementing the `longest_orf` Function
28+
You will implement a function called longest_orf that scans a DNA sequence, finds all open reading frames (ORFs), and returns the longest ORF from the sequence.

orf.ipynb

Lines changed: 634 additions & 0 deletions
Large diffs are not rendered by default.

orf.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
def find_all_starts(dna):
2+
pass # Replace the `pass` with your code
3+
4+
5+
def find_first_in_register_stop(dna):
6+
pass # Replace the `pass` with your code
7+
8+
9+
def all_orfs_range(dna):
10+
pass # Replace the `pass` with your code
11+
12+
13+
def longest_orf(dna):
14+
pass # Replace the `pass` with your code
15+
16+
17+
18+

orf_solution.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
def find_all_starts(dna):
2+
starts = []
3+
for i in range(len(dna)):
4+
if dna[i: i+3] == "ATG":
5+
starts.append(i)
6+
return starts
7+
8+
def find_first_in_register_stop(dna):
9+
for i in range(0, len(dna), 3):
10+
if dna[i:i+3] in ["TGA", "TAG", "TAA"]:
11+
return i+3
12+
return -1
13+
14+
def all_orfs_range(dna):
15+
starts = find_all_starts(dna)
16+
orfs_ranges = []
17+
for start in starts:
18+
end = find_first_in_register_stop(dna[start:])
19+
if end != -1:
20+
orfs_ranges.append((start, start+end))
21+
return orfs_ranges
22+
23+
def longest_orf(dna):
24+
longest = ""
25+
for orf_range in all_orfs_range(dna.upper()):
26+
orf = dna[orf_range[0]: orf_range[1]]
27+
if len(orf) > len(longest):
28+
longest = orf
29+
return longest

0 commit comments

Comments
 (0)