### Data Acquisition Exercises

In [None]:
import os

import pandas as pd
import seaborn as sns

from pydataset import data

from env import get_db_url

4. In a jupyter notebook, `classification_exercises.ipynb`, use a python module (pydata or seaborn datasets) containing datasets as a source from the iris data. Create a pandas dataframe, `df_iris`, from this data.

    - print the first 3 rows
    - print the number of rows and columns (shape)
    - print the column names
    - print the data type of each column
    - print the summary statistics for each of the numeric variables. Would you
      recommend rescaling the data based on these statistics?

In [None]:
df_iris = data('iris')
df_iris.head(3)

In [None]:
df_iris.shape

In [None]:
list(df_iris.columns)

In [None]:
df_iris.dtypes

In [None]:
df_iris.describe()

5. Read the data from [this google sheet](https://docs.google.com/spreadsheets/d/1Uhtml8KY19LILuZsrDtlsHHDC9wuDGUSe8LTEwvdI5g/edit?usp=sharing) into a dataframe, `df_google`

    - print the first 3 rows
    - print the number of rows and columns
    - print the column names
    - print the data type of each column
    - print the summary statistics for each of the numeric variables
    - print the unique values for each of your categorical variables

In [None]:
sheet_url = \
'https://docs.google.com/spreadsheets/d/1Uhtml8KY19LILuZsrDtlsHHDC9wuDGUSe8LTEwvdI5g/edit#gid=341089357'
sheet_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
df_google = pd.read_csv(sheet_url)

In [None]:
df_google.head(3)

In [None]:
df_google.shape

In [None]:
list(df_google.columns)

In [None]:
df_google.info()

In [None]:
df_google.dtypes

In [None]:
df_google.describe()

In [None]:
#print the unique values for each of your categorical variables
list(df_google.Name.unique())

In [None]:
list(df_google.Sex.unique())

In [None]:
df_google.Ticket.unique()

In [None]:
df_google.Cabin.unique()

In [None]:
list(df_google.Embarked.unique())

6. Download the previous exercise's file into an excel (File → Download → Microsoft Excel). Read the downloaded file into a dataframe named ```df_excel```.

    - assign the first 100 rows to a new dataframe, `df_excel_sample`
    - print the number of rows of your original dataframe
    - print the first 5 column names
    - print the column names that have a data type of `object`
    - compute the range for each of the numeric variables.
    

In [None]:
df_excel = pd.read_excel('train.xlsx')
df_excel.head(1)

In [None]:
df_excel_sample = df_excel[:100]
df_excel_sample.head()

In [None]:
df_excel.index.size

In [None]:
df_excel_sample.index.size

In [None]:
df_excel.columns[:5]

In [None]:
dtypes_excel = df_excel.dtypes.reset_index()
dtypes_excel

In [None]:
#print the column names that have a data type of object
list(dtypes_excel[dtypes_excel[0] == 'object']['index'])

In [None]:
df_excel.Fare.dtype

In [None]:
#compute the range for each of the numeric variables.
print('{:<20}|{:>7}'.format('Variable', 'Range'))
print('__________________________\n')
for col in df_excel.columns:
    if df_excel[col].dtype != 'O':
        col_series = df_excel[col]
        #print(f'Range of values in {col} is {col_series.max() - col_series.min()}')
        print('{:<20}|{:>7}'.format(col, round(col_series.max() - col_series.min(), 2)))

Make a new python module, `acquire.py` to hold the following data aquisition functions:

7. Make a function named `get_titanic_data` that returns the titanic data from the codeup data science database as a pandas data frame. Obtain your data from the _Codeup Data Science Database_. 


8. Make a function named `get_iris_data` that returns the data from the `iris_db` on the codeup data science database as a pandas data frame. The returned data frame should include the actual name of the species in addition to the `species_id`s. Obtain your data from the _Codeup Data Science Database_. 

9. Make a function named `get_telco_data` that returns the data from the `telco_churn` database in SQL. In your SQL, be sure to join all 4 tables together, so that the resulting dataframe contains all the contract, payment, and internet service options. Obtain your data from the _Codeup Data Science Database_. 

10. Once you've got your `get_titanic_data`, `get_iris_data`, and `get_telco_data` functions written, now it's time to add caching to them. To do this, edit the beginning of the function to check for the local filename of `telco.csv`, `titanic.csv`, or `iris.csv`. If they exist, use the .csv file. If the file doesn't exist, then produce the SQL and pandas necessary to create a dataframe, then write the dataframe to a .csv file with the appropriate name. 

In [None]:
import acquire as ac

In [None]:
titanic = ac.get_titanic_data()
titanic.head()

In [None]:
iris = ac.get_iris_data()
iris

In [None]:
telco = ac.get_telco_data()
telco.head()