# Tasks

Specific computational tasks I want to learn how to do in python

## Read/print/save a string

on Feb 12th, 2012, Donald Rumsfeld responded to a question fielded at a Department of Defense briefing by saying: 

In [1]:
dod = "Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones."
print(dod)

In [2]:
f = open(r'rumsfeld_quote.txt','w')
f.write(dod)
f.close()

## Split a string into multiple strings with a maximum character limit

[tutorial](https://stackoverflow.com/questions/18854620/whats-the-best-way-to-split-a-string-into-fixed-length-chunks-and-work-with-the/18854817)

A function to break apart a long string. Pass the original string and desired character length.

In [3]:
def stringslice(string, length):
    return (string[0+i:length+i] for i in range(0, len(string), length))

In [1]:
drumsfeld = stringslice(dod, 50)
# tweets are max 280 characters

NameError: name 'stringslice' is not defined

`drumsefeld` is a [generator object](https://realpython.com/introduction-to-python-generators/). 

In [5]:
dr = list(drumsfeld)

In [6]:
dr

["Reports that say that something hasn't happened ar",
 'e always interesting to me, because as we know, th',
 'ere are known knowns; there are things we know we ',
 'know. We also know there are known unknowns; that ',
 'is to say we know there are some things we do not ',
 'know. But there are also unknown unknowns—the ones',
 " we don't know we don't know. And if one looks thr",
 'oughout the history of our country and other free ',
 'countries, it is the latter category that tends to',
 ' be the difficult ones.']

In [7]:
len(dr)

10

### Let's also reformat Donald's response so that the strings remain less than 50 characters long, but don't cut words off. 

**find a way to always line break on a whitespace**

In [26]:
dr

["Reports that say that something hasn't happened ar",
 'e always interesting to me, because as we know, th',
 'ere are known knowns; there are things we know we ',
 'know. We also know there are known unknowns; that ',
 'is to say we know there are some things we do not ',
 'know. But there are also unknown unknowns—the ones',
 " we don't know we don't know. And if one looks thr",
 'oughout the history of our country and other free ',
 'countries, it is the latter category that tends to',
 ' be the difficult ones.']

**here are some ways to slice this list:**

In [27]:
dr[0]

"Reports that say that something hasn't happened ar"

In [29]:
dr[1:9]

['e always interesting to me, because as we know, th',
 'ere are known knowns; there are things we know we ',
 'know. We also know there are known unknowns; that ',
 'is to say we know there are some things we do not ',
 'know. But there are also unknown unknowns—the ones',
 " we don't know we don't know. And if one looks thr",
 'oughout the history of our country and other free ',
 'countries, it is the latter category that tends to']

In [30]:
dr[-1]

' be the difficult ones.'

slice characters in elements in the list

In [34]:
x = dr[-4]
print(x)

 we don't know we don't know. And if one looks thr


In [35]:
x[-2]

'h'

In [37]:
(dr[-4])[-2]
# more concise

'h'

**Next step is to make it so that our original function does not break words up.** 

To that end, incorporate the following logic:

For element (string) in list:<br>
Start at end of element <br>
if character is not whitespace, check the previous character<br>
If character is whitespace, make a new line<br>

## Detect/return specific word or phrase in a string

How many times does Rumsfeld say 'unknown?' How many times does he say 'known?'

*documentation for [re](https://docs.python.org/3/library/re.html)*

In [8]:
import re

In [9]:
word1 = 'known'
word2 = 'unknown'

[tutorial](https://stackoverflow.com/questions/17268958/finding-occurrences-of-a-word-in-a-string-in-python-3)

In [10]:
count = sum(1 for x in re.finditer(r'\b%s\b' % re.escape(word1), dod))
print(count)

2


In [11]:
count = sum(1 for x in re.finditer(r'\b%s\b' % re.escape(word2), dod))
print(count)

1


### This wont work with lists. Can we write a function that takes a list and returns the item in the list that contains the word?

Result should be the element in the list that contains the word (bold added for emphasis) 

> 'ere are **known** knowns; there are things we know we '<br>
> 'know. We also know there are **known** unknowns; that '

### We have a way to match with one word, how do we match on more than one?

In [12]:
words1 = ['know', 'known', 'knowns']
words2 = ['unknown', 'unknowns']

## Detect/return specific word or phrase across multiple strings

In [38]:
dr

["Reports that say that something hasn't happened ar",
 'e always interesting to me, because as we know, th',
 'ere are known knowns; there are things we know we ',
 'know. We also know there are known unknowns; that ',
 'is to say we know there are some things we do not ',
 'know. But there are also unknown unknowns—the ones',
 " we don't know we don't know. And if one looks thr",
 'oughout the history of our country and other free ',
 'countries, it is the latter category that tends to',
 ' be the difficult ones.']

## Write multiple strings to individual markdown files

We want this part to be able to create file names in a sequential order. 

## Detect/return specific word or phrase across files in a directory

## Perform the above tasks outside of jupyter notebook (in a script or something)