# Introduction to Mito

Now that you know have a solid foundation in Jupyter, Python and Pandas, it's time to do some more advanced data manipulation -- things that look and feel like report automation.

To make our lives easier, we're going to use Mito to generate a lot of this Pandas code for us.

Mito is a spreadsheet that lives inside your Jupyter. And for every edit you make to your data using Mito, it generates the equivalent Pandas code for you.

In [None]:
import mitosheet
mitosheet.sheet()


In the rest of this section, we'll practice using Mito to build some simple reports. 


### Excercise 1: Cleaning messy data
Instructions:
* Import the csv file `data2_messy_data.csv`.   
* Convert the dates that are strings to timetime format
* Remove the substring `-Stock` from the `Stock` column using spreadsheet formulas
* Cast the `Transaction_Price` type from `string` to `float`
* Delete columns that only have one unqiue value
* Rename unhelpfully long column headers
* Export the new dataset as data_2_cleaned.csv

In [None]:
import mitosheet
mitosheet.sheet()

In [None]:
data2_messy_data

💡 Notice that for every edit you made in Mito, it generated the equivalent Python code for you. 

There's a lot of pandas syntax in the Mito generated code that we did not learn yet. Take a couple of minutes to read the code and see if you can understand what each line does.


🧑‍💻 Now, try running the Mito generated Python code and notice that the dataframe is exactly the same as the data in Mito.


💡 Be aware each time you rerun the `mitosheet.sheet()` call or make an edit in the Mito spreadsheet, Mito regenerates the Python code in the following code cell. That means if you make edits to the generated code it will be overwritten as soon as you interact with Mito again. If you want to edit the generated code either delete the Mito spreadsheet or copy+paste the python code to a new cell before making changes to it.


## Excercise 2: Building a pivot table

Instructions:

* Import now again the loans `data3_loans.csv` file.
* Convert the `issue_date` column to a date
* Calculate the difference between annual income and loan amount
* Filter the dataset down to just the car loans 
* Create a pivot table that shows the average loan amount for each month


In [None]:
import pandas as pd
loans = pd.read_csv('data3_loans.csv')

mitosheet.sheet(loans)

💡 Notice that instead of importing data from a csv or Excel file, passed a dataframe directly to the `mitosheet.sheet()` function call. 

That means that as long as you can get your data into a pandas dataframe, you can edit it in Mito. You can create the dataframe by writing a sql query and connecting to a database, scraping a website, etc.

## Excercise 3: Simplifying data

Instructions: Import both sheets from `data4_equity_transactions.xlsx`. Find the most recent trade for each stock by combining the transaction records for May and June.

In [None]:
mitosheet.sheet()

## Excercise 4: Calculating new metrics
Instructions: Import `data5_apple_stock.csv` and calculate week over week change in price. Export the result as an Excel file.  

Hint: Try using a combination of spreadsheet formulas and deduplication


In [None]:
mitosheet.sheet()

💡  At this point, you might have a bunch of open Mito spreadsheets in your notebook. If you prefer working in a smaller document, go to `Edit` > `Clear All Outputs`. It will close all of the open Mito spreadsheets, but because you still have the `analysis_to_replay` id, the next time you run the mitosheet.sheet() call, it will repopulate the Mito spreadsheet to the same state.

## Excercise 5: Comparing data sources
* Instructions:
    * Import transaction records from two different data sources: Eagle and an Excel file that is manually tracked.
    * Merge them together on a column called `Transaction ID`
    * Outside of Mito, write the Python code for performing the following checks. 
        * If the either dataset does not have a value in the `Share Quantity` column, set the value of the `Check` column to "Action Required. Missing Data". 
        * If the Quantity numbers are the same, set the value of the `Check` column to "Matching. No action required."
        * If the numbers are not matching, set the value of the `Check` column to "Action Required. Quantity does not match"
    * Import the updated dataframe to Mito and separate the data in 3 different sheets, one for each condition.
    * Download an Excel file with three tabs  

In [None]:
transactions_excel = pd.read_csv('data6_transactions_excel.csv')
transactions_eagle = pd.read_csv('data6_transactions_eagle.csv')

mitosheet.sheet(transactions_excel, transactions_eagle)

In [None]:
def check_row(row):
    # TODO: Replace me with your code. 
    # Hint: The pd.isnull(x) function to check if a cell is NaN might be helpful

In [None]:
df3['Check'] = df3.apply(lambda row: check_row(row), axis = 1)

In [None]:
mitosheet.sheet(df3)