# Enumerating k-mers Lexicographically

### 023_Rosalind_LEXF

## Organizing Strings

When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that "APPLE" precedes "APRON", which in turn comes before "ARMOR".

## Problem
Assume that an alphabet A has a predetermined order; that is, we write the alphabet as a permutation $A = (a1,a2,…,ak)$, where $a1<a2<⋯<ak$. For instance, the English alphabet is organized as $(A,B,…,Z)$.

Given two strings s and t having the same length n, we say that s precedes t in the lexicographic order (and write $s < _{Lex} t$) if the first symbol $s[j]$ that doesn't match $t[j]$ satisfies $s_j < t_j$ in $A$. 

### Given:

A Collection of at most 10 symbols defining an ordered alphabet, and a postive integer $n (n \leq 10)$.

### Return:

All strings of length $n$ that can be formed from the alphabet, ordered lexicographically (use the standard order of symbols in the ENglish alphabet).

## Sample Dataset

```
A C G T
2
```

## Sample Output

```
AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT
```

# Solution one - make then sort

### Make all the possible permutions and store them in a list.

- load symbols and integer from a file.
- store symbols in a list and sort the list
- make all the posible permutations

### Sort the list
- Use .sort()

### Print to a file
- print to file
- to screen print confirmation and number of permutations

In [1]:
# Function to read in file
## take a string path
## return a integer and a string
def readLexf(path):
    fin = open(path, "r")
    lines = fin.readlines()
    #print(lines)
    
    # convert first line into list and second into integer
    clst = lines[0].strip().split(" ")
    n = lines[1].strip()
    print(clst)
    print(n)
    
    return clst, n

In [2]:
# Function to make all possible permutations
## take in a list of permutations, list of characters and an integer for the length of each permutation
## return a list of permutations
def makePerms(plst, clst, n):
    nplst = [] # empty array to hold all the permutations
    #print(plst, clst, n)

    # base case: 
    # if length of the string is 2 make the permutations and return
    if n == 2:
        for i in plst:
            for j in clst:
                nplst.append(i+j)
        return nplst
    
    # all other cases make the new perm list and call the function
    if n > 2:
        for i in plst:
            for j in clst:
                nplst.append(i+j)
        #print(nplst)
        # call the function and pass the new list of permutation
        return makePerms(nplst, clst, n - 1)
        

In [3]:
# Function to make file of permutations
## Takes list of chars and int 
## Calls writePerm and makePerms
def makePermFile(lst_char, len_int):
    perm_lst = makePerms(lst_char, lst_char, len_int)
    perm_lst.sort()
    writePerm(perm_lst)

In [4]:
# Function to write sorted list to a file
## take in sorted list
## create a file
def writePerm(per_lst):
    # open a file
    fout = open('result_023.txt', 'w')
    
    # loop through the permutations
    for i in per_lst:
        fout.writelines(i + "\n")
    
    # close the file
    fout.close()

In [5]:
# Command function 
## takes a string path
def getPerms(path):
    # get data from rosalind file
    cl, ni =  readLexf(path)
    n = int(ni)
    # make the permutations and sort
    perms = makePerms(cl, cl, n)
    perms.sort()
    
    print("Number of permutations:", len(perms))
    # write the results to a file
    writePerm(perms)

In [6]:
getPerms("rosalind_lexf.txt")

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
2
Number of permutations: 100
