# Bar chart

## Setup

In [2]:
import pandas as pd
import altair as alt

- We disable Altair's data restrictions to be able to plot Dataframes with more than 5000 rows: `alt.data_transformers.disable_max_rows()`

In [3]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

We also want to ignore a specific warning:

In [4]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Data

### Import data

In [38]:
ROOT = "https://raw.githubusercontent.com/kirenz/datasets/master/"
DATA = "loan50.csv"

df = pd.read_csv(ROOT + DATA)

### Data structure

Display the dataframe with `df`

In [39]:
df

Unnamed: 0,state,emp_length,term,homeownership,annual_income,verified_income,debt_to_income,total_credit_limit,total_credit_utilized,num_cc_carrying_balance,loan_purpose,loan_amount,grade,interest_rate,public_record_bankrupt,loan_status,has_second_income,total_income
0,NJ,3.0,60,rent,59000,Not Verified,0.557525,95131,32894,8,debt_consolidation,22000,B,10.9,0,Current,False,59000
1,CA,10.0,36,rent,60000,Not Verified,1.305683,51929,78341,2,credit_card,6000,B,9.92,1,Current,False,60000
2,SC,,36,mortgage,75000,Verified,1.05628,301373,79221,14,debt_consolidation,25000,E,26.3,0,Current,False,75000
3,CA,0.0,36,rent,75000,Not Verified,0.574347,59890,43076,10,credit_card,6000,B,9.92,0,Current,False,75000
4,OH,4.0,60,mortgage,254000,Not Verified,0.23815,422619,60490,2,home_improvement,25000,B,9.43,0,Current,False,254000
5,IN,6.0,36,mortgage,67000,Source Verified,1.077045,349825,72162,4,home_improvement,6400,B,9.92,0,Current,False,67000
6,NY,2.0,36,rent,28800,Source Verified,0.099722,15980,2872,1,debt_consolidation,3000,D,17.09,0,Current,False,28800
7,MO,10.0,36,mortgage,80000,Not Verified,0.350913,258439,28073,3,credit_card,14500,A,6.08,0,Current,False,80000
8,FL,6.0,60,rent,34000,Not Verified,0.6975,87705,23715,10,credit_card,10000,A,7.97,0,Current,False,34000
9,FL,3.0,60,mortgage,80000,Source Verified,0.166854,330394,32036,4,debt_consolidation,18500,C,12.62,1,Current,True,192000


Show info

In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 18 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   state                    50 non-null     object 
 1   emp_length               48 non-null     float64
 2   term                     50 non-null     int64  
 3   homeownership            50 non-null     object 
 4   annual_income            50 non-null     int64  
 5   verified_income          50 non-null     object 
 6   debt_to_income           50 non-null     float64
 7   total_credit_limit       50 non-null     int64  
 8   total_credit_utilized    50 non-null     int64  
 9   num_cc_carrying_balance  50 non-null     int64  
 10  loan_purpose             50 non-null     object 
 11  loan_amount              50 non-null     int64  
 12  grade                    50 non-null     object 
 13  interest_rate            50 non-null     float64
 14  public_record_bankrupt   50 

### Data corrections

Change the data format from object to category for the variable `homeownership` with `.astype("category")`

In [41]:
# Change data format from object to category
df['homeownership'] = df['homeownership'].astype("category")

### Variable lists

Next, we select the relevant variables we want to use (this will ease the plotting process).

We only use the variable `homeownership`

In [42]:
# make a list of variables you want to use
var_list = ['homeownership']

# create a new dataframe called source with only var_list
source = df[var_list]

## Analysis

We start our analysis with Altair, a declarative statistical visualization library for Python, based on Vega and Vega-Lite.

Altair charts work out-of-the-box on Jupyter Notebook, so long as there is a **web connection** to load the required javascript libraries.

Here is an example of using the Altair API (foo is a placeholder):


**a**lt.**C**hart().**m**ark_foo().**e**ncode() 


*You can remember the order of code blocks with the acronym "**a.C.m.e**"*

```python
alt.Chart(DATAFRAME).mark_PLOT().encode(
    x=alt.X('VARIABLE'),
    y=alt.Y('VARIABLE')
)

```

replace

- DATAFRAME with your data (e.g., `source` or `df`)
- PLOT with the plot type of your choice (e.g., `bar` or `circle`)
- VARIABLE with the varible name you want to plot

### Standard bar chart

In [43]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership'),
    y=alt.Y('homeownership', aggregate='count')
)

### Sorted bar chart

In [33]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y'), # sort
    y=alt.Y('homeownership', aggregate='count')
)

### Bar chart with properties

In [34]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y'),
    y=alt.Y('homeownership', aggregate='count')
).properties( # properties
    title='This is a simple bar chart',
    width=400,
    height=300
)

### Bar chart with custom axes

In [35]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y').axis(
        title="Homeownership", # title of x axis
        labelAngle=0, # angle of x axis text
        titleAnchor="start"), # adjustment of text
    y=alt.Y('homeownership', aggregate='count').axis(
        title = "Count",
        titleAnchor="end")
).properties(
    title='This is a bar chart with custom axes',
    width=400,
    height=300
)

### Bar chart with custom axes and title

In [36]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y').axis(
        title="Homeownership", 
        labelAngle=0,
        titleAnchor="start"),
    y=alt.Y('homeownership', aggregate='count').axis(
        title = "Count",
        titleAnchor="end",
        grid=False) # no grid
).properties(
    title='This is a bar chart with custom axes and title',
    width=400,
    height=300
).configure_title( # custom title
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
).configure_view(strokeWidth=0) # no border