# Bar chart

## Setup

- We use the "magic function" `%matplotlib inline` in the first cell of the notebook to enable the inline plotting (the plots/graphs will be displayed just below the cell where your plotting commands are written). 

- Line-oriented magic functions (also called line magics) start with a percentage sign (%) followed by the arguments in the rest of the line without any quotes or parentheses.

- We also disable Altair's data restrictions to be able to plot Dataframes with more than 5000 rows: `alt.data_transformers.disable_max_rows()`

In [1]:
%matplotlib inline

import pandas as pd
import altair as alt

alt.data_transformers.disable_max_rows()

ModuleNotFoundError: No module named 'matplotlib'

We also want to ignore a specific warning:

In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Data

### Import data

In [None]:
ROOT = "https://raw.githubusercontent.com/kirenz/datasets/master/"
DATA = "loans.csv"

Import data with Pandas and name the Dataframe `df`

In [3]:
### BEGIN SOLUTION
df = pd.read_csv(ROOT + DATA)
### END SOLUTION

In [None]:
"""Check that df returns the correct output"""
assert len(df) == 10000
assert df.columns.tolist() == ['group', 'outcome']

### Data structure

Display the dataframe with `df`

In [4]:
df

Unnamed: 0,emp_title,emp_length,state,homeownership,annual_income,verified_income,debt_to_income,annual_income_joint,verification_income_joint,debt_to_income_joint,...,sub_grade,issue_month,loan_status,initial_listing_status,disbursement_method,balance,paid_total,paid_principal,paid_interest,paid_late_fees
0,global config engineer,3.0,NJ,mortgage,90000.0,Verified,18.01,,,,...,C3,Mar-2018,Current,whole,Cash,27015.86,1999.33,984.14,1015.19,0.0
1,warehouse office clerk,10.0,HI,rent,40000.0,Not Verified,5.04,,,,...,C1,Feb-2018,Current,whole,Cash,4651.37,499.12,348.63,150.49,0.0
2,assembly,3.0,WI,rent,40000.0,Source Verified,21.15,,,,...,D1,Feb-2018,Current,fractional,Cash,1824.63,281.80,175.37,106.43,0.0
3,customer service,1.0,PA,rent,30000.0,Not Verified,10.16,,,,...,A3,Jan-2018,Current,whole,Cash,18853.26,3312.89,2746.74,566.15,0.0
4,security supervisor,10.0,CA,rent,35000.0,Verified,57.96,57000.0,Verified,37.66,...,C3,Mar-2018,Current,whole,Cash,21430.15,2324.65,1569.85,754.80,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,owner,10.0,TX,rent,108000.0,Source Verified,22.28,,,,...,A4,Jan-2018,Current,whole,Cash,21586.34,2969.80,2413.66,556.14,0.0
9996,director,8.0,PA,mortgage,121000.0,Verified,32.38,,,,...,D3,Feb-2018,Current,whole,Cash,9147.44,1456.31,852.56,603.75,0.0
9997,toolmaker,10.0,CT,mortgage,67000.0,Verified,45.26,107000.0,Source Verified,29.57,...,E2,Feb-2018,Current,fractional,Cash,27617.65,4620.80,2382.35,2238.45,0.0
9998,manager,1.0,WI,mortgage,80000.0,Source Verified,11.99,,,,...,A1,Feb-2018,Current,whole,Cash,21518.12,2873.31,2481.88,391.43,0.0


Show info

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 55 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   emp_title                         9167 non-null   object 
 1   emp_length                        9183 non-null   float64
 2   state                             10000 non-null  object 
 3   homeownership                     10000 non-null  object 
 4   annual_income                     10000 non-null  float64
 5   verified_income                   10000 non-null  object 
 6   debt_to_income                    9976 non-null   float64
 7   annual_income_joint               1495 non-null   float64
 8   verification_income_joint         1455 non-null   object 
 9   debt_to_income_joint              1495 non-null   float64
 10  delinq_2y                         10000 non-null  int64  
 11  months_since_last_delinq          4342 non-null   float64
 12  earli

### Data corrections

Change the data format from object to category for the variables `homeownership` and `application_type` with `.astype("category")`

In [6]:
# Change data format from object to category
df['homeownership'] = df['homeownership'].astype("category")
df['application_type'] = df['application_type'].astype("category")

### Variable lists

Next, we select the relevant variables we want to use (this will ease the plotting process).

We only use the variable `homeownership`

In [7]:
# make a list of variables you want to use
var_list = ['homeownership']

# create a new dataframe called source with only var_list
source = df[var_list]

## Analysis

We start our analysis with Altair, a declarative statistical visualization library for Python, based on Vega and Vega-Lite.

Here is an example of using the Altair API (foo is a placeholder):


**a**lt.**C**hart().**m**ark_foo().**e**ncode() 


*You can remember the order of code blocks with the acronym "**a.C.m.e**"*

```python
alt.Chart(DATAFRAME).mark_PLOT().encode(
    x=alt.X('VARIABLE'),
    y=alt.Y('VARIABLE')
)

```

replace

- DATAFRAME with your data (e.g., `source` or `df`)
- PLOT with the plot type of your choice (e.g., `bar` or `circle`)
- VARIABLE with the varible name you want to plot

### Standard bar chart

In [8]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership'),
    y=alt.Y('count(homeownership)')
)

### Sorted bar chart

In [25]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership', 
           sort='-y'), # sort
    y=alt.Y('count(homeownership)')
)

### Bar chart with properties

In [26]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership',
           sort='-y'),
    y=alt.Y('count(homeownership)')
).properties( # properties
    title='This is a simple bar chart',
    width=300,
    height=150
)

### Bar chart with custom axes

In [27]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership', 
            sort='-y',
            axis=alt.Axis(title="Homeownership", # title of x axis
                          labelAngle=0, # angle of x axis text
                          titleAnchor="start")), # adjustment of text
    y=alt.Y('count(homeownership)',              
            axis=alt.Axis(title = "Count",
                          titleAnchor="end"))
).properties(
    title='This is a bar chart with custom axes',
    width=300,
    height=150
)

### Bar chart with custom axes and title

In [28]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership',
            sort='-y',
            axis=alt.Axis(title="Homeownership", 
                          labelAngle=0,
                          titleAnchor="start")),
    y=alt.Y('count(homeownership)', 
            axis=alt.Axis(title = "Count",
                          titleAnchor="end"))
).properties(
    title='This is a bar chart with custom axes and title',
    width=300,
    height=150
).configure_title( # custom title
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)