## Coding 1:

In [1]:
import requests
from Bio import Entrez, SeqIO
Entrez.email = 'sahasra@uchicago.edu'

class Macro():
    def __init__(self, db, id, rettype='fasta', retmode='text'):
        record = Entrez.efetch(db = db, id = id, rettype = rettype, retmode = retmode) #uses Entrez package to properly retrieve fasta seq info from NCBI
        recordInfo = record.readline().split(" ") #saves accession number, macro name and species info separately for easy access later
        with open('%s.txt' % recordInfo[1], 'w') as file: #opens text file as name of macro for better identification of appropriate file
            file.writelines(recordInfo)
            self.sequence = record.read()
            file.write(self.sequence) #writes sequence into file after writing header(which was read in first)
        self.accessionNum = recordInfo[0]
        self.macroType = db #saves database info as macromolecule type
        self.macroName = recordInfo[1]
        self.macroSpecies = recordInfo[2]

    def get_macro_accessNum(self):
        return self.accessionNum 

    def set_macro_name(self, newName):
        self.macroName = newName #resets macro name in case of lack of specificity/over-specificity, or inaccuracy
    
    def seqLength(self):
        return len(self.sequence) 

    def convertToLower(self):
        return self.sequence.lower() #gives sequence in lowercase, if user wishes for lowercase view of seq

class Protein(Macro):
    def __init__(self, id, db="protein"):
        super().__init__(db,id) #protein subclass initialization requires specification of only 1 arg

    def findOccsOfAA(self, AA=""):
        count = 0
        AA = AA.upper()
        for c in self.sequence:
            if c==AA:
                count+=1 #counts all occurrences of specified amino acid in protein seq
        return count

class Nucleotide(Macro):
    def __init__(self, id, db='nucleotide'):
        super().__init__(db, id) #nucleotide subclass initialization requires specification of only 1 arg

    def codonCount(self):
        cc = 0
        for i in range(0,len(self.sequence),3):
            cc+=1
        return cc #returns total number of codons in nuc sequence 
    
    def searchCodonAndOccs(self, cod):
        codList = []
        cod = cod.upper()
        for i in range(0,len(self.sequence),3): #iterates frames of 3 bases
            if self.sequence[i:i+3] == cod: #checks if any frame matches specified codon to be searched for
                codList.append(i+1) #adds all pos(NOT index) of start of codon to list
        return codList,len(codList) #returns list of positions as well as total number of occurrences of codon

### example of use of class and subclasses with initialization of subclasses and implementation of class methods ###
#myMacro = Macro(db='protein, id='KAI4025714.1')
myProt = Protein(id='KAI4025714.1')
myNuc = Nucleotide(id='AB046569.1')
print(myNuc.searchCodonAndOccs("ttt"))
    

([1, 139, 271, 502, 526, 886, 919, 1354, 1636, 1663, 2047, 2074, 2134, 2227, 2302, 2320, 2503, 2521, 2560], 19)


#### Macro Class Description:
   The Macro Class defined above is to be used to fetch FASTA information about proteins, nucleotides, and more, from the NCBI database. Depending on what the user already knows about the sequence they want, they can choose to initialize an object under a specific subclass or more generically under the Macro class. The Macro class allows for easy access of information regarding the macromolecule type, the species it is from, the accession number, and the full sequence as well. In addition, the class methods gives the user the ability to quickly retrieve this information and use it to get other information, such as the length of the sequence. Depending on what the user finds more convenient, they can also choose to view the sequence in lowercase using convertToLower(). The subclasses Protein and Nucleotide show two ways in which the Macro class can be extended and defines methods that counts amino acid occurrences in a protein sequence and returns the count and positions of a codon being searched for, respectively. 

## Coding 2:

In [12]:
from functools import reduce 
import random
import math
userIn = input("Please enter a list of whole numbers separated by spaces: ").split(" ") #takes in user input of integers
messyList = [int(i) for i in userIn] #casts each item in user input list as integer

#(1)-> Using for loop:
cleanList1 = [num for num in messyList if round(math.sqrt(num))**2==num] #uses comprehension to retain integers that are perfect squares

#(1)-> Using lambda:
cleanList2 = list(filter(lambda nums: (round(math.sqrt(nums))**2==nums), messyList)) #uses lambda to keep integers that are perfect squares

print("with for loop: ", cleanList1)
print("with lambda: ", cleanList2)
######################

randList = [random.randint(0,25) for i in range(12)]
#(2)-> Using for loop:
lSum = 0
for i in randList: #uses for loop to get sum of elements in randList
    lSum+=i
listAvg1 = lSum/len(randList)
print(listAvg1)

#(2)-> Using lambda:
# source = https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/
listAvg2 = (reduce(lambda i, j: i + j, randList)/len(randList)) #uses lambda and reduce to get list average by adding list elements first then dividing by list length
print(listAvg2)

Please enter a list of whole numbers separated by spaces: 2 4
with for loop:  [4]
with lambda:  [4]
12.5
12.5


## Coding 3:

In [4]:
import timeit #uses timeit module to note the efficiency of each version of code and check whether the changes made improved code efficiency 
# source for help with timeit: https://www.geeksforgeeks.org/timeit-python-examples/

#(1)--> dot product square 2x2 matrices
##(a: inefficient):
mysetup = "import random"
testcode = '''
# initializes two random 2x2 matrices of integers
myRandMat1 = [[random.randint(0,10) for i in range(2)] for y in range(2)]
myRandMat2 = [[random.randint(0,10) for i in range(2)] for y in range(2)]

#method to transpose second matrix used in dot product
def matTranspose(mat): #source(not used exactly): https://www.geeksforgeeks.org/transpose-matrix-single-line-python/
    tMat = []
    for i in range(len(mat)):
        tMat.append([])
        for j in range(len(mat[i])):
            currVal = mat[j][i]
            tMat[i].append(currVal)
    return tMat

tranMat2 = matTranspose(myRandMat2)

dotMat = []
for x in range(len(myRandMat1)): #creation of dotMat containing lists for each index of final dot product matrix
    dotMat.append([])
    for y in range(len(tranMat2)):
        dotMat[x].append([]) #each nested list contains products of ints that will later be summed for the final matrix
        for z in range(len(tranMat2)):
            dotMat[x][y].append((myRandMat1[x][z]*tranMat2[y][z])) #appends products of ints in two matrices involved in sum of each position of dot product matrix
            #appended to nested list at correct index of dotMat list
        
finalMat = []
for x in range(len(dotMat)): #iterates through nested lists and appends sum of each list to final matrix
    finalMat.append([])
    for y in dotMat[x]:
        finalMat[x].append(sum(y))

'''

print("Code 1a[inefficient]: ")
print(timeit.timeit(setup=mysetup, stmt=testcode, number=100000))

##(b: efficient???):

# source: https://numpy.org/doc/stable/reference/generated/numpy.dot.html
mysetup2 = "import numpy as np, random"
testcode2 = '''
# initialization of two random 2x2 matrices of integers
mat1 = np.random.randint(10, size=(2,2))
mat2 = np.random.randint(10, size=(2,2))

np.dot(mat1,mat2)
'''
print("Code 1b[efficient]: ")
print(timeit.timeit(setup=mysetup2, stmt=testcode2, number=100000))


#(2)--> sort list of lists(ascending order within each list; then ascending by first element of each list)
##(a: inefficient):
mysetup3 = "import random"
testcode3 = '''

LL = [] #random initialization of five nested lists in one main list
for m in range(5):
    randList = [random.randint(0,10) for i in range(random.randint(0,5))]
    LL.append(randList)

#inner sorting of lists
for x in range(len(LL)): #iterates through inner lists
    for y in range(len(LL[x])-1): #val1 in nested list
        for z in range(y+1,len(LL[x])): #val2 in nested list
            if LL[x][y] > LL[x][z]: #comparing two values
                LL[x][y],LL[x][z] = LL[x][z],LL[x][y] #values swap positions if n > n+1

for i in range(len(LL)-1):
    for j in range(i+1,len(LL)):
        if LL[i] > LL[j]:
            LL[i],LL[j] = LL[j],LL[i]

'''
print("Code 2a[inefficient]: ")
print(timeit.timeit(setup=mysetup3, stmt=testcode3, number=100000))

##(b: efficient):
mysetup4 = "import random"
testcode4 = '''

LL2 = [] #random initialization of five nested lists in one main list
for m in range(5):
    randList = [random.randint(0,10) for i in range(random.randint(0,5))]
    LL2.append(randList)

for l in LL2: #inner list sort
    l.sort()
    
sorted(LL2) #outer list sort 
'''
print("Code 2b[efficient]: ")
print(timeit.timeit(setup=mysetup4, stmt=testcode4, number=100000))

Code 1a[inefficient]: 
1.616462416999994
Code 1b[efficient]: 
2.6212622080000045
Code 2a[inefficient]: 
3.413861374999996
Code 2b[efficient]: 
1.704474042000001


#### Coding 3(Responses)-->
##### Explanation for efficiency/"lack of" efficiency in Code 1(dot product):
   Running the inefficient and "efficient" code for dot product of matrices, resulted in very unexpected runtimes. While the numpy implementation of dot product was expected to run significantly quicker, it seems to run slower than the more naive implementation of matrix dot product. Where the numpy version took about 2.62 seconds to run, the naive version took 1.62 seconds to run-- a result that was very strange to say the least. After some research however, this unexpected result can be explained to be a system "issue". The M1 Macbook Pros were tested for data science purposes and for whatever reason (it is not known why exactly), numpy on the M1 gets executed slower when compared to the 2019 Intel system [source: https://towardsdatascience.com/are-the-new-m1-macbooks-any-good-for-data-science-lets-find-out-e61a01e8cad1]. Therefore, although this specific system runs the numpy implementation slower, it can be considered to be more efficient in general. 

##### Explanation for efficiency in Code 2(sort list of lists):
   Version 2 of the nested list sorting is very obviously the more efficient method of "two-level" sorting. This can likely be attributed to the fact that built-in functions like .sort() and sorted() are used versus a traditional sort that involves comparing two values of a list at a time, followed by another sort using the first nested list element. The difference in time is as follows: 3.41(inefficient) and 1.70(efficient).