# Bar chart

## Setup

In [2]:
import pandas as pd
import altair as alt

- We disable Altair's data restrictions to be able to plot Dataframes with more than 5000 rows: `alt.data_transformers.disable_max_rows()`

In [3]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

We also want to ignore a specific warning:

In [4]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Data

### Import data

In [5]:
ROOT = "https://raw.githubusercontent.com/kirenz/datasets/master/"
DATA = "loans.csv"

df = pd.read_csv(ROOT + DATA)

### Data structure

Display the dataframe with `df`

In [6]:
df

Unnamed: 0.1,Unnamed: 0,emp_title,emp_length,state,homeownership,annual_income,verified_income,debt_to_income,annual_income_joint,verification_income_joint,...,sub_grade,issue_month,loan_status,initial_listing_status,disbursement_method,balance,paid_total,paid_principal,paid_interest,paid_late_fees
0,9594,rn,4.0,TX,mortgage,92000.00,Source Verified,19.28,,,...,B2,Feb-2018,Fully Paid,whole,Cash,0.00,40013.788333,40000.00,13.79,0.0
1,3534,retail store manager,10.0,NV,own,62000.00,Not Verified,29.71,,,...,A2,Jan-2018,Current,whole,DirectPay,3486.96,606.500000,513.04,93.46,0.0
2,7574,supervisor,3.0,CA,rent,43000.00,Source Verified,15.57,101000.0,Not Verified,...,B3,Feb-2018,Current,whole,Cash,17258.09,1551.340000,941.91,609.43,0.0
3,129,automotive service lane manager,10.0,WV,own,172000.00,Verified,24.31,,,...,C4,Feb-2018,Current,whole,Cash,34345.64,3399.460000,1654.36,1745.10,0.0
4,6937,operations manager,10.0,SC,mortgage,96355.33,Verified,9.44,,,...,D3,Mar-2018,Current,whole,Cash,33934.09,2672.570000,1065.91,1606.66,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,9750,sales manager,10.0,MA,mortgage,90000.00,Not Verified,27.36,,,...,D2,Jan-2018,Current,whole,Cash,14197.97,1891.950000,802.03,1089.92,0.0
196,5820,design engineer,8.0,OH,mortgage,110000.00,Not Verified,22.52,,,...,A1,Jan-2018,Current,whole,Cash,19149.86,3299.650000,2850.14,449.51,0.0
197,3909,qa,10.0,OR,mortgage,90000.00,Not Verified,23.87,,,...,A4,Mar-2018,Current,whole,Cash,33292.54,3322.210000,2707.46,614.75,0.0
198,8887,community director,8.0,FL,rent,70000.00,Source Verified,14.04,,,...,D4,Mar-2018,Current,whole,Cash,14067.64,1647.340000,932.36,714.98,0.0


Show info

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 56 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Unnamed: 0                        200 non-null    int64  
 1   emp_title                         182 non-null    object 
 2   emp_length                        182 non-null    float64
 3   state                             200 non-null    object 
 4   homeownership                     200 non-null    object 
 5   annual_income                     200 non-null    float64
 6   verified_income                   200 non-null    object 
 7   debt_to_income                    200 non-null    float64
 8   annual_income_joint               31 non-null     float64
 9   verification_income_joint         29 non-null     object 
 10  debt_to_income_joint              31 non-null     float64
 11  delinq_2y                         200 non-null    int64  
 12  months_s

### Data corrections

Change the data format from object to category for the variables `homeownership` and `application_type` with `.astype("category")`

In [8]:
# Change data format from object to category
df['homeownership'] = df['homeownership'].astype("category")
df['application_type'] = df['application_type'].astype("category")

### Variable lists

Next, we select the relevant variables we want to use (this will ease the plotting process).

We only use the variable `homeownership`

In [9]:
# make a list of variables you want to use
var_list = ['homeownership']

# create a new dataframe called source with only var_list
source = df[var_list]

## Analysis

We start our analysis with Altair, a declarative statistical visualization library for Python, based on Vega and Vega-Lite.

Altair charts work out-of-the-box on Jupyter Notebook, so long as there is a **web connection** to load the required javascript libraries.

Here is an example of using the Altair API (foo is a placeholder):


**a**lt.**C**hart().**m**ark_foo().**e**ncode() 


*You can remember the order of code blocks with the acronym "**a.C.m.e**"*

```python
alt.Chart(DATAFRAME).mark_PLOT().encode(
    x=alt.X('VARIABLE'),
    y=alt.Y('VARIABLE')
)

```

replace

- DATAFRAME with your data (e.g., `source` or `df`)
- PLOT with the plot type of your choice (e.g., `bar` or `circle`)
- VARIABLE with the varible name you want to plot

### Standard bar chart

In [10]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership'),
    y=alt.Y('homeownership', aggregate='count')
)

### Sorted bar chart

In [11]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y'), # sort
    y=alt.Y('homeownership', aggregate='count')
)

### Bar chart with properties

In [20]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y'),
    y=alt.Y('homeownership', aggregate='count')
).properties( # properties
    title='This is a simple bar chart',
    width=400,
    height=300
)

### Bar chart with custom axes

In [21]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y').axis(
        title="Homeownership", # title of x axis
        labelAngle=0, # angle of x axis text
        titleAnchor="start"), # adjustment of text
    y=alt.Y('homeownership', aggregate='count').axis(
        title = "Count",
        titleAnchor="end")
).properties(
    title='This is a bar chart with custom axes',
    width=400,
    height=300
)

### Bar chart with custom axes and title

In [22]:
alt.Chart(source).mark_bar().encode(
    x=alt.X('homeownership').sort('-y').axis(
        title="Homeownership", 
        labelAngle=0,
        titleAnchor="start"),
    y=alt.Y('homeownership', aggregate='count').axis(
        title = "Count",
        titleAnchor="end",
        grid=False) # no grid
).properties(
    title='This is a bar chart with custom axes and title',
    width=400,
    height=300
).configure_title( # custom title
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
).configure_view(strokeWidth=0) # no border