# Problem

A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

Given: A DNA string $s$ of length at most 1000 nt.

Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in $s$

### Sample Dataset

In [1]:
s = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC'

### Sample output

In [2]:
res = '20 12 17 21'
res

'20 12 17 21'

### Solution

In [3]:
import numpy as np

In [4]:
def count_letter_frequencies(s, order=['A', 'C', 'G', 'T']):
    bases, freq = np.unique(list(s), return_counts=True)
    res = {b: f for b, f in zip(bases, freq)}
    output = ' '.join([f'{res[letter]}' for letter in order])
    return output

In [5]:
count_letter_frequencies(s)

'20 12 17 21'

In [6]:
assert(count_letter_frequencies(s) == res)

### Real data

In [7]:
data = 'TCCGATGCAGTTGCAACATGTCGTAGAAAGTATCAGAGTACGGCACTATAGATAACATACTAGTCTAAGCGCGAAATCAGGTCTCAAAAGAGGGACTTATGTGCTTGCCATGCAGTCGGGCGAAAGGGAGTCATGCGTATTCACAGCAATTACGCGCTCTCATTTCATATACAAGTCCGTACACTATTATCTAATCCTTGAAGACTACTACGGTCTCTATGCTCAAAGACAATCCTCTACGCATCGATTTCGGTTCGTCCTGGTCGTGCCAACAGTGAAACGGTTCTAAATATCCTCGCCGCTCACGTCTCGTCCATTTTTTACCGGTCATGGTTGGAGTAAAACGACTTGAACAATAACGGTAGAGGTCGAGTCTGGTTCCTGACCGAGTGAAGGATCTCCCGAACCTCCATCGCGGTGCAGAGCTTACCACCTGACTAACAGCGCGATCGAGGAGATTGGCATTAACTCGTTGCCAGGGTTCGGTATTATCGAGCAAGGTAACGACGTTGCAGTCCCCTAGTTAACGTTTAATGCCCCATCGTATAGAACAAAGTGAACTGCGCAAACTCGCACGATAATATTACACATGGCGACTGATACTTGCCTAATTGAGAGATGTGGAACCGAGGTACGGCATCCCGGAAAATGTCTACAGCCTAGATAATATGGTACGCAAGGGAACGATAAGCGTACTTTGACTAAGCACCGCAGTGAGAAATTACCAGCGTTTACGGTAATACCGCCGGCGCCCTCGACCATATACTGCGACGCGGTGTTCCGGTTTTGGGATCAAAACAGTGATCGCCTCAATCGGAGAATGAC'
count_letter_frequencies(data)

'229 200 196 200'