# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [1]:
# Import reduce from functools, numpy and pandas

import numpy as np
import pandas as pd
from functools import reduce

# Challenge 1 - Mapping

#### We will use the map function to clean up words in a book.

In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran.

In [2]:
# Run this code:

location = '../data/58585-0.txt'
with open(location, 'r', encoding="utf8") as f:
    prophet = f.read().split(' ')

#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. 

Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element).

In [13]:
# your code here
prophet = prophet[568:]
prophet

['sheaves',
 'of',
 'corn',
 'he',
 'gathers',
 'you',
 'unto\nhimself.\n\nHe',
 'threshes',
 'you',
 'to',
 'make',
 'you',
 'naked.\n\nHe',
 'sifts',
 'you',
 'to',
 'free',
 'you',
 'from',
 'your\nhusks.\n\nHe',
 'grinds',
 'you',
 'to',
 'whiteness.\n\nHe',
 'kneads',
 'you',
 'until',
 'you',
 'are',
 'pliant;\n\nAnd',
 'then',
 'he',
 'assigns',
 'you',
 'to',
 'his',
 'sacred\nfire,',
 'that',
 'you',
 'may',
 'become',
 'sacred',
 'bread\nfor',
 'God’s',
 'sacred',
 'feast.\n\n*****\n\nAll',
 'these',
 'things',
 'shall',
 'love',
 'do',
 'unto',
 'you\nthat',
 'you',
 'may',
 'know',
 'the',
 'secrets',
 'of',
 'your\nheart,',
 'and',
 'in',
 'that',
 'knowledge',
 'become',
 'a\nfragment',
 'of',
 'Life’s',
 'heart.\n\nBut',
 'if',
 'in',
 'your',
 'fear',
 'you',
 'would',
 'seek',
 'only\nlove’s',
 'peace',
 'and',
 'love’s',
 'pleasure,\n\nThen',
 'it',
 'is',
 'better',
 'for',
 'you',
 'that',
 'you\ncover',
 '{17}your',
 'nakedness',
 'and',
 'pass',
 'out',
 'of\nlove

If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10.

In [12]:
# your code here
prophet[1:10]

['of',
 'corn',
 'he',
 'gathers',
 'you',
 'unto\nhimself.\n\nHe',
 'threshes',
 'you',
 'to']

#### The next step is to create a function that will remove references. 

We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below.

In [21]:
def reference(x):
    '''
    Input: A string
    Output: The string with references removed
    
    Example:
    Input: 'the{7}'
    Output: 'the'
    '''
    
    # your code here
    return x.split("{")[0]
reference('the{7}')


'the'

Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`.

In [35]:
# your code here
prophet_reference = list(map(reference, prophet))
prophet_reference


['sheaves',
 'of',
 'corn',
 'he',
 'gathers',
 'you',
 'unto\nhimself.\n\nHe',
 'threshes',
 'you',
 'to',
 'make',
 'you',
 'naked.\n\nHe',
 'sifts',
 'you',
 'to',
 'free',
 'you',
 'from',
 'your\nhusks.\n\nHe',
 'grinds',
 'you',
 'to',
 'whiteness.\n\nHe',
 'kneads',
 'you',
 'until',
 'you',
 'are',
 'pliant;\n\nAnd',
 'then',
 'he',
 'assigns',
 'you',
 'to',
 'his',
 'sacred\nfire,',
 'that',
 'you',
 'may',
 'become',
 'sacred',
 'bread\nfor',
 'God’s',
 'sacred',
 'feast.\n\n*****\n\nAll',
 'these',
 'things',
 'shall',
 'love',
 'do',
 'unto',
 'you\nthat',
 'you',
 'may',
 'know',
 'the',
 'secrets',
 'of',
 'your\nheart,',
 'and',
 'in',
 'that',
 'knowledge',
 'become',
 'a\nfragment',
 'of',
 'Life’s',
 'heart.\n\nBut',
 'if',
 'in',
 'your',
 'fear',
 'you',
 'would',
 'seek',
 'only\nlove’s',
 'peace',
 'and',
 'love’s',
 'pleasure,\n\nThen',
 'it',
 'is',
 'better',
 'for',
 'you',
 'that',
 'you\ncover',
 '',
 'nakedness',
 'and',
 'pass',
 'out',
 'of\nlove’s',
 't

Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\n`. Write your function in the cell below.

In [25]:
def line_break(x):
    '''
    Input: A string
    Output: A list of strings split on the line break (\n) character
        
    Example:
    Input: 'the\nbeloved'
    Output: ['the', 'beloved']
    '''
    
    # your code here
    return x.split("\n")
line_break('the\nbeloved')

['the', 'beloved']

Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`.

In [28]:
# your code here
prophet_line = list(map(line_break, prophet_reference))
prophet_line

[['sheaves'],
 ['of'],
 ['corn'],
 ['he'],
 ['gathers'],
 ['you'],
 ['unto', 'himself.', '', 'He'],
 ['threshes'],
 ['you'],
 ['to'],
 ['make'],
 ['you'],
 ['naked.', '', 'He'],
 ['sifts'],
 ['you'],
 ['to'],
 ['free'],
 ['you'],
 ['from'],
 ['your', 'husks.', '', 'He'],
 ['grinds'],
 ['you'],
 ['to'],
 ['whiteness.', '', 'He'],
 ['kneads'],
 ['you'],
 ['until'],
 ['you'],
 ['are'],
 ['pliant;', '', 'And'],
 ['then'],
 ['he'],
 ['assigns'],
 ['you'],
 ['to'],
 ['his'],
 ['sacred', 'fire,'],
 ['that'],
 ['you'],
 ['may'],
 ['become'],
 ['sacred'],
 ['bread', 'for'],
 ['God’s'],
 ['sacred'],
 ['feast.', '', '*****', '', 'All'],
 ['these'],
 ['things'],
 ['shall'],
 ['love'],
 ['do'],
 ['unto'],
 ['you', 'that'],
 ['you'],
 ['may'],
 ['know'],
 ['the'],
 ['secrets'],
 ['of'],
 ['your', 'heart,'],
 ['and'],
 ['in'],
 ['that'],
 ['knowledge'],
 ['become'],
 ['a', 'fragment'],
 ['of'],
 ['Life’s'],
 ['heart.', '', 'But'],
 ['if'],
 ['in'],
 ['your'],
 ['fear'],
 ['you'],
 ['would'],
 ['seek'

If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`.

In [30]:
# your code here
#Given a list of lists t,
#flat_list = [item for sublist in t for item in sublist]

prophet_flat = [word for lst in prophet_line for word in lst]
prophet_flat

['sheaves',
 'of',
 'corn',
 'he',
 'gathers',
 'you',
 'unto',
 'himself.',
 '',
 'He',
 'threshes',
 'you',
 'to',
 'make',
 'you',
 'naked.',
 '',
 'He',
 'sifts',
 'you',
 'to',
 'free',
 'you',
 'from',
 'your',
 'husks.',
 '',
 'He',
 'grinds',
 'you',
 'to',
 'whiteness.',
 '',
 'He',
 'kneads',
 'you',
 'until',
 'you',
 'are',
 'pliant;',
 '',
 'And',
 'then',
 'he',
 'assigns',
 'you',
 'to',
 'his',
 'sacred',
 'fire,',
 'that',
 'you',
 'may',
 'become',
 'sacred',
 'bread',
 'for',
 'God’s',
 'sacred',
 'feast.',
 '',
 '*****',
 '',
 'All',
 'these',
 'things',
 'shall',
 'love',
 'do',
 'unto',
 'you',
 'that',
 'you',
 'may',
 'know',
 'the',
 'secrets',
 'of',
 'your',
 'heart,',
 'and',
 'in',
 'that',
 'knowledge',
 'become',
 'a',
 'fragment',
 'of',
 'Life’s',
 'heart.',
 '',
 'But',
 'if',
 'in',
 'your',
 'fear',
 'you',
 'would',
 'seek',
 'only',
 'love’s',
 'peace',
 'and',
 'love’s',
 'pleasure,',
 '',
 'Then',
 'it',
 'is',
 'better',
 'for',
 'you',
 'that',

# Challenge 2 - Filtering

When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise.

In [31]:
def word_filter(x):
    '''
    Input: A string
    Output: True if the word is not in the specified list 
    and False if the word is in the list.
        
    Example:
    word list = ['and', 'the']
    Input: 'and'
    Output: False
    
    Input: 'John'
    Output: True
    '''
    
    word_list = ['and', 'the', 'a', 'an']
    
    # your code here
    return False if x in word_list else True

word_filter('John')


True

Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`.

In [37]:
prophet_filter = list(filter(word_filter, prophet_flat))

# Bonus Challenge

Rewrite the `word_filter` function above to not be case sensitive.

In [None]:
def word_filter_case(x):
   
    word_list = ['and', 'the', 'a', 'an']
    
    # your code here
     return False if x.lower() in word_list else True

list(filter(word_filter_case, prophet_flat))

# Challenge 3 - Reducing

#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. 

We will start by writing a function that takes two strings and concatenates them together with a space between the two strings.

In [34]:
def concat_space(a, b):
    '''
    Input:Two strings
    Output: A single string separated by a space
        
    Example:
    Input: 'John', 'Smith'
    Output: 'John Smith'
    '''
    
    # your code here
    return a + ' ' + b


Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`.

In [38]:
# your code here
prophet_string = reduce(concat_space, prophet_filter)

# Challenge 4 - Applying Functions to DataFrames

#### Our next step is to use the apply function to a dataframe and transform all cells.

To do this, we will connect to Ironhack's database and retrieve the data from the *pollution* database. Select the *beijing_pollution* table and retrieve its data.

In [59]:
# your code here
bp = pd.read_csv (r'C:\Users\tusha\Downloads\pollution2.csv')
bp_df = pd.DataFrame(bp)
bp_df = pd.read_csv(r'C:\Users\tusha\Downloads\pollution2.csv', sep=';')

Let's look at the data using the `head()` function.

In [60]:
# your code here
bp_df.head(5)
#bp_df.shape

Unnamed: 0,Date,Day,DOW,Holiday,Influenza,Iws,PM10_lag0,PMc_lag0,Ir,NO2_lag0,...,all1,URTI0,URTI1,LRTI0,LRTI1,Asthma0,Asthma1,AECOPD0,AECOPD1,xxx
0,01/01/2013,1,2,1,1,,,,,,...,90,295,23,153,59,8.0,3.0,1.0,3.0,1
1,02/01/2013,2,3,1,1,,,,,,...,93,307,23,94,61,5.0,3.0,2.0,2.0,1
2,03/01/2013,3,4,1,1,,,,,,...,91,319,19,101,63,12.0,8.0,,,1
3,04/01/2013,4,5,0,1,,,,,,...,91,293,23,84,60,5.0,1.0,1.0,2.0,1
4,05/01/2013,5,6,0,1,,,,,,...,84,275,21,86,58,4.0,1.0,,2.0,1


The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below.

In [44]:
def hourly(x):
    '''
    Input: A numerical value
    Output: The value divided by 24
        
    Example:
    Input: 48
    Output: 2.0
    '''
    
    # your code here
    return x / 24

Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`.

In [63]:
# your code here
pm25_hourly = bp_df[['Iws', 'Is', 'Ir']].apply(hourly)

#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.

Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation.

In [76]:
def sample_sd(x):
    '''
    Input: A Pandas series of values
    Output: the standard deviation divided by the number of elements in the series
        
    Example:
    Input: pd.Series([1,2,3,4])
    Output: 0.3726779962
    '''
    
    # your code here
    x = [1,2,3,4]
    return np.std(x) / (x.count(x) - 1)

In [77]:
sample_sd()

TypeError: sample_sd() missing 1 required positional argument: 'x'