# Jupyter Notebook Introduction
## What is Jupyter Notebook?
- Jupyter Notebook is an interactive browser application where you can combine code, outputs from the code, data visualizations, and explanatory into a single document (like the file you are reading now).
- Jupyter Notebook allows you to run Python in the browser. The browser only provides an interactive surface for you to type in the code. The code you type in is executed by the local python program in your computer. So you don't need Internet access to open and run a Jupyter Notebook file.
- A notebook typically contain multiple cells, in which you can type in your Python code. You then run the cell to execute the code. Any results from the code will show up below the cell.  

## Complete the Following Task
- This file illustrates how to load a dataset and then navigate that dataset through Jupyter Notebook.
- Please follow the instruction on top of each empty cell and type the code in that empty cell. You can then execute the cell by putting your cursor in it and click "Run"

## Step 1: Import pandas package
---
```Python
import <packagename> as <shortname>
```
---
- We need to import `pandas` for us to load the dataset.
- `pandas` is a library written for the Python programming language for data manipulation and analysis.
- You can think of Python as your mobile phone operating system, and `pandas` as an app installed in your phone that lets you do more with your phone. However, you will need to open the app before you can do any task (data analysis). This is called importing a package.
- `import pandas as pd` means we `import` `pandas` and name it as `pd` for easy reference afterwards. You can use any shortname you'd like or none at all, but it makes it easier to reference in later code.

In [14]:
import pandas as pd
import matplotlib as plt

## Step 2: Load the dataset
---
```Python
df = pd.read_csv(<filelocation>)
```
---
- First, make sure you saved the dataset `Compustat_fy2019.csv` along with the notebook file in the same local folder (the one with your last name). 
- We use the function `read_csv` contained in the `pd` package to open this dataset. In the code, we can refer to this function using `pd.read_csv()`; which you can think of as telling Python "use the `read_csv` function found in the `pd` package". In this sense, you can think of the dot `.` after `pd` as telling Python to look for the functions within the pandas *app*.
- In the parantheses of `pd.read_csv()`, we can specify related parameters. In this example, we need to point Python to the dataset that is stored in the default directory, i.e., `'compustat_fy2019.csv'` (remember we need to put quotes around a string).
- Lastly, we need to tell Python to store the dataset in memory by giving it a name. We name the dataset `df` in this example, or some variation of that. `df` is short for "`d`ata `f`rame" (you can also name it anything you want). If you do not name it (in other words, you don't use the equal sign), then Python will not remember it!

In [15]:
df = pd.read_csv('compustat_fy2019.csv')

## Step 3: Navigate the dataset
---
```Python
df.head(<n>)
```
---
- We can have python return the first *n* rows in the loaded dataset using `df.head(n)`. Again think of `head(n)` as a function applied to the dataframe.
- Columns contain variables
    - tic (ticker)
    - conm (company name)
    - datadate (fiscal year end)
    - fyear (fiscal year)
    - at (total assets)
    - lt (total liabilities)
    - teq (total equities)
    - revt (revenue)
    - ni (net income)
    - exchg (exchange code; 11 New York Stock Exchange; 12 American Stock Exchange; 14 NASDAQ)
- Rows represent companies

In [18]:
df.head(10)

Unnamed: 0,tic,conm,datadate,fyear,at,lt,teq,revt,ni,exchg
0,AIR,AAR CORP,31MAY2020,2019,2079.0,1176.4,902.6,2089.3,4.4,11
1,AAL,AMERICAN AIRLINES GROUP INC,31DEC2019,2019,59995.0,60113.0,-118.0,45768.0,1686.0,14
2,CECE,CECO ENVIRONMENTAL CORP,31DEC2019,2019,408.637,215.62,193.017,341.869,17.707,14
3,ASA,ASA GOLD AND PRECIOUS METALS,30NOV2019,2019,286.612,0.733,285.879,2.371,91.431,11
4,PNW,PINNACLE WEST CAPITAL CORP,31DEC2019,2019,18479.247,12926.059,5553.188,3471.209,538.32,11
5,AAN,AARON'S INC,31DEC2019,2019,3297.8,1560.541,1737.259,3947.656,31.472,11
6,ABT,ABBOTT LABORATORIES,31DEC2019,2019,67887.0,36586.0,31301.0,31904.0,3687.0,11
7,ACU,ACME UNITED CORP,31DEC2019,2019,110.749,55.044,55.705,142.457,5.514,12
8,BKTI,BK TECHNOLOGIES CORP,31DEC2019,2019,37.94,14.664,23.276,40.1,-2.636,12
9,AE,ADAMS RESOURCES & ENERGY INC,31DEC2019,2019,330.842,179.201,151.641,1811.247,8.207,12


In [7]:
df.describe

<bound method NDFrame.describe of        tic                          conm   datadate  fyear         at  \
0      AIR                      AAR CORP  31MAY2020   2019   2079.000   
1      AAL   AMERICAN AIRLINES GROUP INC  31DEC2019   2019  59995.000   
2     CECE       CECO ENVIRONMENTAL CORP  31DEC2019   2019    408.637   
3      ASA  ASA GOLD AND PRECIOUS METALS  30NOV2019   2019    286.612   
4      PNW    PINNACLE WEST CAPITAL CORP  31DEC2019   2019  18479.247   
...    ...                           ...        ...    ...        ...   
4722  RNLX              RENALYTIX AI PLC  30JUN2019   2019      9.700   
4723  CTRM           CASTOR MARITIME INC  31DEC2019   2019     30.421   
4724  IMUX                   IMMUNIC INC  31DEC2019   2019     65.955   
4725  ARMP    ARMATA PHARMACEUTICALS INC  31DEC2019   2019     25.451   
4726   PSV   HERMITAGE OFFSHORE SERVICES  31DEC2019   2019    201.909   

             lt       teq       revt        ni  exchg  
0      1176.400   902.600   2089.

In [8]:
df.info

<bound method DataFrame.info of        tic                          conm   datadate  fyear         at  \
0      AIR                      AAR CORP  31MAY2020   2019   2079.000   
1      AAL   AMERICAN AIRLINES GROUP INC  31DEC2019   2019  59995.000   
2     CECE       CECO ENVIRONMENTAL CORP  31DEC2019   2019    408.637   
3      ASA  ASA GOLD AND PRECIOUS METALS  30NOV2019   2019    286.612   
4      PNW    PINNACLE WEST CAPITAL CORP  31DEC2019   2019  18479.247   
...    ...                           ...        ...    ...        ...   
4722  RNLX              RENALYTIX AI PLC  30JUN2019   2019      9.700   
4723  CTRM           CASTOR MARITIME INC  31DEC2019   2019     30.421   
4724  IMUX                   IMMUNIC INC  31DEC2019   2019     65.955   
4725  ARMP    ARMATA PHARMACEUTICALS INC  31DEC2019   2019     25.451   
4726   PSV   HERMITAGE OFFSHORE SERVICES  31DEC2019   2019    201.909   

             lt       teq       revt        ni  exchg  
0      1176.400   902.600   2089.30

In [9]:
df.dtypes

tic          object
conm         object
datadate     object
fyear         int64
at          float64
lt          float64
teq         float64
revt        float64
ni          float64
exchg         int64
dtype: object

In [13]:
df.value_counts("tic")

tic
A       1
ONTO    1
OPRX    1
OPRT    1
OPRA    1
       ..
FEIM    1
FEDU    1
FE      1
FDX     1
ZYXI    1
Name: count, Length: 4727, dtype: int64

In [17]:
df.plthist()

AttributeError: 'DataFrame' object has no attribute 'plthist'