# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [1]:
# import reduce from functools, numpy and pandas
from functools import reduce
import numpy as np
import pandas as pd

# Challenge 1 - Mapping

#### We will use the map function to clean up a words in a book.

In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran.

In [2]:
# Run this code:

location = '../58585-0.txt'
with open(location, 'r', encoding="utf8") as f:
    prophet = f.read().split(' ')

#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. 

Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element).

In [3]:
# Your code here:
prophet = prophet[568:]

If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10.

In [4]:
# Your code here:
prophet[1:11]

['the{7}',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his']

#### The next step is to create a function that will remove references. 

We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below.

In [5]:
def reference(x):
    '''
    Input: A string
    Output: The string with references removed
    
    Example:
    Input: 'the{7}'
    Output: 'the'
    '''
    return (x.split("{"))[0] if "{" in x else x 

Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`

In [6]:
# Your code here:
prophet_reference = list(map(reference, prophet))
prophet_reference[0:11]

['PROPHET\n\n|Almustafa,',
 'the',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his']

Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\n`. Write your function in the cell below.

In [7]:
def line_break(x):
    '''
    Input: A string
    Output: A list of strings split on the line break (\n) character
        
    Example:
    Input: 'the\nbeloved'
    Output: ['the', 'beloved']
    '''
    
    # Your code here:
    print(x.splitlines())
    return x.splitlines()

Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`.

In [8]:
# Your code here:
prophet_line = list(map(line_break, prophet_reference))

['PROPHET', '', '|Almustafa,']
['the']
['chosen']
['and']
['the', 'beloved,']
['who']
['was']
['a']
['dawn']
['unto']
['his']
['own', 'day,']
['had']
['waited']
['twelve']
['years']
['in']
['the']
['city', 'of']
['Orphalese']
['for']
['his']
['ship']
['that']
['was']
['to', 'return']
['and']
['bear']
['him']
['back']
['to']
['the']
['isle']
['of', 'his']
['birth.', '', 'And']
['in']
['the']
['twelfth']
['year,']
['on']
['the']
['seventh', 'day']
['of']
['Ielool,']
['the']
['month']
['of']
['reaping,']
['he', 'climbed']
['the']
['hill']
['without']
['the']
['city']
['walls', 'and']
['looked']
['seaward;']
['and']
['he']
['beheld']
['his', 'ship']
['coming']
['with']
['the']
['mist.', '', 'Then']
['the']
['gates']
['of']
['his']
['heart']
['were']
['flung', 'open,']
['and']
['his']
['joy']
['flew']
['far']
['over']
['the']
['sea.', 'And']
['he']
['closed']
['his']
['eyes']
['and']
['prayed']
['in']
['the', 'silences']
['of']
['his']
['soul.', '', '*****', '', 'But']
['as']
['he']
['desce

['are']
['sacrificed']
['for']
['that', 'which']
['is']
['purer']
['and']
['still']
['more']
['innocent', 'in']
['man.', '', '*****', '', 'When']
['you']
['kill']
['a']
['beast']
['say']
['to']
['him']
['in']
['your', 'heart,', '', '“By']
['the']
['same']
['power']
['that']
['slays']
['you,']
['I']
['too', 'am']
['slain;']
['and']
['I']
['too']
['shall']
['be']
['consumed.']
['the']
['law']
['that']
['delivered']
['you']
['into', 'my']
['hand']
['shall']
['deliver']
['me']
['into']
['a']
['mightier', 'hand.', '', 'Your']
['blood']
['and']
['my']
['blood']
['is']
['naught']
['but', 'the']
['sap']
['that']
['feeds']
['the']
['tree']
['of']
['heaven.”', '', '*****', '', 'And']
['when']
['you']
['crush']
['an']
['apple']
['with']
['your', 'teeth,']
['say']
['to']
['it']
['in']
['your']
['heart,', '', '“Your']
['seeds']
['shall']
['live']
['in']
['my']
['body,', '', 'And']
['the']
['buds']
['of']
['your']
['tomorrow']
['shall', 'blossom']
['in']
['my']
['heart,', '', 'And']
['your']
['fragr

['when']
['you']
['cease']
['to']
['speak']
['of', 'freedom']
['as']
['a']
['goal']
['and']
['a']
['fulfilment.', '', 'You']
['shall']
['be']
['free']
['indeed']
['when']
['your', 'days']
['are']
['not']
['without']
['a']
['care']
['nor']
['your']
['without']
['a']
['want']
['and']
['a']
['grief,', '', 'But']
['rather']
['when']
['these']
['things']
['girdle']
['your', 'life']
['and']
['yet']
['you']
['rise']
['above']
['them']
['naked', 'and']
['unbound.', '', '*****', '', 'And']
['how']
['shall']
['you']
['rise']
['beyond']
['your']
['days', 'and']
['nights']
['unless']
['you']
['break']
['the', 'chains']
['which']
['you']
['at']
['the']
['dawn']
['of']
['your', 'understanding']
['have']
['fastened']
['around']
['your', 'noon']
['hour?', '', 'In']
['truth']
['that']
['which']
['you']
['call']
['freedom']
['is', 'the']
['strongest']
['of']
['these']
['chains,']
['though', 'its']
['links']
['glitter']
['in']
['the']
['sun']
['and']
['dazzle', 'your']
['eyes.', '', 'And']
['what']
['is'

['hidden', 'the']
['gate']
['to']
['eternity.']
[]
['fear', 'of']
['death']
['is']
['but']
['the']
['trembling']
['of']
['the', 'shepherd']
['when']
['he']
['stands']
['before']
['the']
['king', 'whose']
['hand']
['is']
['to']
['be']
['laid']
['upon']
['him']
['in', 'honour.', '', 'Is']
['the']
['shepherd']
['not']
['joyful']
['beneath']
['his', 'trembling,']
['that']
['he']
['shall']
['wear']
['the']
['mark', 'of']
['the']
['king?', '', 'Yet']
['is']
['he']
['not']
['more']
['mindful']
['of']
['his', 'trembling?', '', '*****', '', 'For']
['what']
['is']
['it']
['to']
['die']
['but']
['to']
['stand']
['naked', 'in']
['the']
['wind']
['and']
['to']
['melt']
['into']
['the']
['sun?', '', 'And']
['what']
['is']
['it']
['to']
['cease']
['breathing,']
['but', 'to']
['free']
['the']
['breath']
['from']
['its']
['restless', 'tides,']
['that']
['it']
['may']
['rise']
['and']
['expand']
['and', 'seek']
['God']
['unencumbered?', '', 'Only']
['when']
['you']
['drink']
['from']
['the']
['river']
[

['copyright']
['holder']
['found']
['at']
['the', 'beginning']
['of']
['this']
['work.', '', '1.E.4.']
['Do']
['not']
['unlink']
['or']
['detach']
['or']
['remove']
['the']
['full']
['Project']
['Gutenberg-tm', 'License']
['terms']
['from']
['this']
['work,']
['or']
['any']
['files']
['containing']
['a']
['part']
['of']
['this', 'work']
['or']
['any']
['other']
['work']
['associated']
['with']
['Project']
['Gutenberg-tm.', '', '1.E.5.']
['Do']
['not']
['copy,']
['display,']
['perform,']
['distribute']
['or']
['redistribute']
['this', 'electronic']
['work,']
['or']
['any']
['part']
['of']
['this']
['electronic']
['work,']
['without', 'prominently']
['displaying']
['the']
['sentence']
['set']
['forth']
['in']
['paragraph']
['1.E.1']
['with', 'active']
['links']
['or']
['immediate']
['access']
['to']
['the']
['full']
['terms']
['of']
['the']
['Project', 'Gutenberg-tm']
['License.', '', '1.E.6.']
['You']
['may']
['convert']
['to']
['and']
['distribute']
['this']
['work']
['in']
['any']
['b

If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`.

In [9]:
# Your code here:

prophet_flat = [j for i in prophet_line for j in i]
prophet_flat[1:11]

['',
 '|Almustafa,',
 'the',
 'chosen',
 'and',
 'the',
 'beloved,',
 'who',
 'was',
 'a']

# Challenge 2 - Filtering

When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise.

In [10]:
def word_filter(x):
    '''
    Input: A string
    Output: true if the word is not in the specified list and false if the word is in the list
        
    Example:
    word list = ['and', 'the']
    Input: 'and'
    Output: False
    
    Input: 'John'
    Output: True
    '''
    
    word_list = ['and', 'the', 'a', 'an']
    
    return x not in word_list

Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`.

In [11]:
prophet_filter = list(filter(word_filter, prophet_flat))
prophet_filter[1:11]

['',
 '|Almustafa,',
 'chosen',
 'beloved,',
 'who',
 'was',
 'dawn',
 'unto',
 'his',
 'own']

# Bonus Challenge - Part 1

Rewrite the `word_filter` function above to not be case sensitive.

In [12]:
def word_filter_case(x):
   
    word_list = ['and', 'the', 'a', 'an']
    
    # Your code here:
    return x.lower() not in word_list
    

# Challenge 3 - Reducing

#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. 

We will start by writing a function that takes two strings and concatenates them together with a space between the two strings.

In [13]:
def concat_space(a, b):
    '''
    Input:Two strings
    Output: A single string separated by a space
        
    Example:
    Input: 'John', 'Smith'
    Output: 'John Smith'
    '''
    
    # Your code here:
    return f"{a} {b}"
    

Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`.

In [14]:
# Your code here:
prophet_string = reduce(concat_space, prophet_filter)

# Challenge 4 - Applying Functions to DataFrames

#### Our next step is to use the apply function to a dataframe and transform all cells.

To do this, we will load a dataset below and then write a function that will perform the transformation.

In [15]:
# Run this code:

# The dataset below contains information about pollution from PM2.5 particles in Beijing 

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv"
pm25 = pd.read_csv(url)

Let's look at the data using the `head()` function.

In [16]:
# Your code here:
pm25.head()


Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0


The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below.

In [17]:
def hourly(x):
    '''
    Input: A numerical value
    Output: The value divided by 24
        
    Example:
    Input: 48
    Output: 2.0
    '''
    
    # Your code here:
    return x/24
    

Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`.

In [18]:
# Your code here:
columns_to_apply =  ["Iws", "Is", "Ir"]
pm25[columns_to_apply] = pm25[columns_to_apply].apply(hourly)
pm25.head(100)


Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,0.074583,0.0,0.0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,0.205000,0.0,0.0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,0.279583,0.0,0.0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,0.410000,0.0,0.0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,0.540417,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,96,2010,1,4,23,31.0,-26,-15.0,1035.0,NW,8.268750,0.0,0.0
96,97,2010,1,5,0,30.0,-26,-17.0,1035.0,NW,8.399167,0.0,0.0
97,98,2010,1,5,1,34.0,-26,-18.0,1035.0,NW,8.566667,0.0,0.0
98,99,2010,1,5,2,27.0,-26,-19.0,1035.0,NW,8.697083,0.0,0.0


#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.

Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation.

In [19]:
def sample_sd(x):
    '''
    Input: A Pandas series of values
    Output: the standard deviation divided by the number of elements in the series
        
    Example:
    Input: pd.Series([1,2,3,4])
    Output: 0.3726779962
    '''
    
    # Your code here:
    return np.std(x)/(x.count() - 1)

In [20]:
pm26 = pm25[columns_to_apply]
pm26 = pm26.apply(sample_sd)
pm26

Iws    4.754929e-05
Is     7.229519e-07
Ir     1.346183e-06
dtype: float64