### Import library

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Python Data Science Toolbox (Part 1)

### User defined function

Write a simple function
In the last video, Hugo described the basics of how to define a function. You will now write your own function!

Define a function, `shout()`, which simply prints out a string with three exclamation marks `'!!!`' at the end. The code for the `square()` function that we wrote earlier is found below. You can use it as a pattern to define `shout()`.

In [None]:
# Define the function shout
def shout():
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    
    shout_word='congratulations'+'!!!'
    # Print shout_word
    print(shout_word)

# Call shout
shout()

Single-parameter functions
Congratulations! You have successfully defined and called your own function! That's pretty cool.

In the previous exercise, you defined and called the function `shout()`, which printed out a string concatenated with `'!!!'`. You will now update `shout()` by adding a parameter so that it can accept and process any string argument passed to it. Also note that s`hout(word)`, the part of the header that specifies the function name and parameter(s), is known as the signature of the function. You may encounter this term in the wild!

In [None]:
# Define shout with the parameter, word
def shout(word):
    """Print a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word = word + '!!!'

    # Print shout_word
    print(shout_word)

# Call shout with the string 'congratulations'
shout('congratulations')

Functions that return single values
You're getting very good at this! Try your hand at another modification to the `shout()` function so that it now returns a single value instead of printing within the function. Recall that the `return` keyword lets you return values from functions. Parts of the function `shout()`, which you wrote earlier, are shown. Returning values is generally more desirable than printing them out because, as you saw earlier, a `print()` call assigned to a variable has type `NoneType`.

In [None]:
# Define shout with the parameter, word
def shout(word):
    """Return a string with three exclamation marks"""
    # Concatenate the strings: shout_word
    shout_word=word+'!!!'

    # Replace print with return
    return shout_word

# Pass 'congratulations' to shout: yell
yell=shout('congratulations')

# Print yell
print(yell)

### Multiple parameters and return values


Functions with multiple parameters
Hugo discussed the use of multiple parameters in defining functions in the last lecture. You are now going to use what you've learned to modify the `shout()` function further. Here, you will modify `shout()` to accept two arguments. Parts of the function shout(), which you wrote earlier, are shown.

In [None]:
# Define shout with parameters word1 and word2
def shout(word1, word2):
    """Concatenate strings with three exclamation marks"""
    # Concatenate word1 with '!!!': shout1
    shout1=word1 + '!!!'
    
    # Concatenate word2 with '!!!': shout2
    shout2= word2 + '!!!'
    
    # Concatenate shout1 with shout2: new_shout
    new_shout=shout1+shout2

    # Return new_shout
    return new_shout

# Pass 'congratulations' and 'you' to shout(): yell
yell=shout('congratulations', 'you')

# Print yell
print(yell)

### `lamda` function

Writing a lambda function you already know

Some function definitions are simple enough that they can be converted to a lambda function. By doing this, you write less lines of code, which is pretty awesome and will come in handy, especially when you're writing and maintaining big programs. In this exercise, you will use what you know about lambda functions to convert a function that does a simple task into a lambda function. Take a look at this function definition:
<code>
def echo_word(word1, echo):
    """Concatenate echo copies of word1."""
    words = word1 * echo
    return words</code>
    
The function `echo_word` takes 2 parameters: a string value, `word1` and an integer value, `echo`. It returns a string that is a concatenation of `echo` copies of `word1`. Your task is to convert this simple function into a lambda function.

In [None]:
# Define echo_word as a lambda function: echo_word
echo_word = lambda word1, echo: word1*echo

# Call echo_word: result
result = echo_word('hey',5)

# Print result
print(result)

# [pandas Foundations](https://learn.datacamp.com/courses/pandas-foundations)

## Review of pandas DataFrames

In [None]:
aapl=pd.read_csv('data/datacamp/AAPL.csv')

In [None]:
aapl.head()

In [None]:
aapl.dtypes

In [None]:
aapl['Date']=pd.to_datetime(aapl.Date)

In [None]:
aapl.dtypes

In [None]:
aapl.set_index('Date', inplace=True)

In [None]:
aapl.tail()

### indexes and columns

In [None]:
aapl.shape

In [None]:
aapl.columns

In [None]:
type(aapl.columns)

In [None]:
aapl.index

In [None]:
type(aapl.index)

### Slicing

In [None]:
aapl.iloc[:5, :]

In [None]:
aapl.iloc[-5:, :]

### `info()`

In [None]:
aapl.info()

### Brosdcasting

In [None]:
aapl.iloc[::3,-2]=np.nan

In [None]:
aapl.head()

In [None]:
aapl.info()

### Series

In [None]:
low=aapl['Low']
type(low)

In [None]:
low.head()

In [None]:
lows=low.values
type(lows)

## Building DataFrames from scratch

### Dtaframes from `dict` (1)

In [None]:
data = {'weekday': ['Sun', 'Sun', 'Mon', 'Mon'],
        'city': ['Austin', 'Dallas', 'Austin', 'Dallas'],
        'visitors': [139, 237, 326, 456],
        'signups': [7, 12, 3, 5]}
users=pd.DataFrame(data)
users

### Dtaframes from `dict` (2)

In [None]:
cities = ['Austin', 'Dallas', 'Austin', 'Dallas']
signups = [7, 12, 3, 5]
visitors = [139, 237, 326, 456]
weekdays = ['Sun', 'Sun', 'Mon', 'Mon']
list_labels = ['city', 'signups', 'visitors', 'weekday']
list_cols = [cities, signups, visitors, weekdays]
zipped = list(zip(list_labels, list_cols))
zipped

In [None]:
data=dict(zipped)
data

In [None]:
users=pd.DataFrame(data)
users

### Broadcasting

In [None]:
users['fees'] = 0 # Broadcasts to entire column
print(users)

### Broadcasting with a dict

In [None]:
heights = [ 59.0, 65.2, 62.9, 65.4, 63.7, 65.7, 64.1 ]
data = {'height': heights, 'sex': 'M'}
results = pd.DataFrame(data)
results

### Index and columns

In [None]:
results.columns = ['height (in)', 'sex']
results.index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
results

## Importing and exporting data

In [2]:
filepath='data/datacamp/ISSN_D_tot.csv'
sunspots=pd.read_csv(filepath)
sunspots.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72103 entries, 0 to 72102
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   1818      72103 non-null  int64  
 1   01        72103 non-null  int64  
 2   01.1      72103 non-null  int64  
 3   1818.004  72103 non-null  float64
 4    -1       72103 non-null  int64  
 5   1         72103 non-null  int64  
dtypes: float64(1), int64(5)
memory usage: 3.3 MB


In [3]:
sunspots.iloc[10:20, :]

Unnamed: 0,1818,01,01.1,1818.004,-1,1
10,1818,1,12,1818.034,-1,1
11,1818,1,13,1818.037,22,1
12,1818,1,14,1818.04,-1,1
13,1818,1,15,1818.042,-1,1
14,1818,1,16,1818.045,-1,1
15,1818,1,17,1818.048,46,1
16,1818,1,18,1818.051,59,1
17,1818,1,19,1818.053,63,1
18,1818,1,20,1818.056,-1,1
19,1818,1,21,1818.059,-1,1


### Using header keyword

In [4]:
sunspots=pd.read_csv(filepath, header=None)
sunspots.iloc[10:20,:]

Unnamed: 0,0,1,2,3,4,5
10,1818,1,11,1818.031,-1,1
11,1818,1,12,1818.034,-1,1
12,1818,1,13,1818.037,22,1
13,1818,1,14,1818.04,-1,1
14,1818,1,15,1818.042,-1,1
15,1818,1,16,1818.045,-1,1
16,1818,1,17,1818.048,46,1
17,1818,1,18,1818.051,59,1
18,1818,1,19,1818.053,63,1
19,1818,1,20,1818.056,-1,1


### Using names keyword

In [5]:
col_names=['year', 'month', 'day', 'dec_date', 'sunspots', 'definite']
sunspots=pd.read_csv(filepath, header=None, names=col_names)
sunspots.iloc[10:15, :]

Unnamed: 0,year,month,day,dec_date,sunspots,definite
10,1818,1,11,1818.031,-1,1
11,1818,1,12,1818.034,-1,1
12,1818,1,13,1818.037,22,1
13,1818,1,14,1818.04,-1,1
14,1818,1,15,1818.042,-1,1


### Using `na_values` keyword (1)

In [6]:
sunspots = pd.read_csv(filepath, header=None,
                       names=col_names, na_values=' -1')
sunspots.iloc[10:15, :]

Unnamed: 0,year,month,day,dec_date,sunspots,definite
10,1818,1,11,1818.031,,1
11,1818,1,12,1818.034,,1
12,1818,1,13,1818.037,22.0,1
13,1818,1,14,1818.04,,1
14,1818,1,15,1818.042,,1


### Using `na_values` keyword (2)

In [7]:
sunspots = pd.read_csv(filepath, header=None,
                       names=col_names, na_values=' -1')
sunspots.iloc[10:15, :]

Unnamed: 0,year,month,day,dec_date,sunspots,definite
10,1818,1,11,1818.031,,1
11,1818,1,12,1818.034,,1
12,1818,1,13,1818.037,22.0,1
13,1818,1,14,1818.04,,1
14,1818,1,15,1818.042,,1


### Using `na_values` keyword (3)

In [13]:
sunspots = pd.read_csv(filepath, header=None,
                       names=col_names, na_values={'sunspots':[' -1']})
sunspots.iloc[10:15, :]

Unnamed: 0,year,month,day,dec_date,sunspots,definite
10,1818,1,11,1818.031,,1
11,1818,1,12,1818.034,,1
12,1818,1,13,1818.037,22.0,1
13,1818,1,14,1818.04,,1
14,1818,1,15,1818.042,,1


### Using `parse_dates` keyword

In [15]:
sunspots=pd.read_csv(filepath, header=None, names=col_names,
                    na_values={'sunspots': ['-1']},
                    parse_dates=[0,1,2])
sunspots.iloc[10:15, :]

Unnamed: 0,year,month,day,dec_date,sunspots,definite
10,1818-01-01,1,11,1818.031,-1,1
11,1818-01-01,1,12,1818.034,-1,1
12,1818-01-01,1,13,1818.037,22,1
13,1818-01-01,1,14,1818.04,-1,1
14,1818-01-01,1,15,1818.042,-1,1


### Inspecting DataFrame

## Plotting with pandas

In [None]:
aapl=pd.read_csv('data/datacamp/AAPL.csv', index_col='Date',
parse_dates=True)