# Client table

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import pprint
import missingno as msno
from helper_functions import open_table_list_columns, groupby_percent, groupby_plotsize
import os
DATADIR = os.getenv('DATADIR')

## Read in data 

In [None]:
client = open_table_list_columns(DATADIR, 'Client')

In [None]:
print("There are {} rows in client and {} unique clients (Serial).\nTherefore is {} that rows uniquely identify clients".format(client.shape[0], client.Serial.nunique(), (client.shape[0]==client.Serial.nunique())))

## Explore missing

In [None]:
msno.heatmap(client)

In [None]:
msno.matrix(client.sample(250))

In [None]:
msno.dendrogram(client, orientation='left')

### Status

In [None]:
groupby_plotsize(client, 'Status')

In [None]:
groupby_percent(client, 'Status', "Serial")

### Services

In [None]:
print("There are {} services in Addaction".format(len(client.groupby('Service'))))

In [None]:
print("There are {} project codes in Addaction".format(len(client.groupby('ProjectCode'))))

### Gender
Although be wary of this as the coding for M and F switched fomr 0 to 1 at some point!

In [None]:
groupby_plotsize(client, 'Gender')

In [None]:
groupby_percent(client, 'Gender', "Serial")

### Relationship status

In [None]:
groupby_plotsize(client, 'RelationshipStatus')

In [None]:
groupby_percent(client, 'RelationshipStatus', 'Serial')

Are these categoires mutually exclusive?

In [None]:
client.groupby('RelationshipStatus').size().sum()

In [None]:
print("There are {} clients with more than 1 relationship status".format(sum(client.groupby(['Serial','RelationshipStatus']).size().groupby(level='Serial').sum()>1)))

### Ethnicity

These could be collapsed for modelling. What about?:
- Married/Civil Partnership + Married + Civil Partnership + Cohabiting + With Partner
- Single + Separated + Divorced + Widowed + Separated or Divorced + Never Married
- Not known + missing

ONS uses: 
Marital status indicates whether a person is legally married or not. This publication uses five categories of legal marital status: 
- single, never married or civil partnered

- married, including separated (this category includes those in both opposite-sex and same-sex marriages)

- civil partnered, including separated

- divorced, including legally dissolved civil partners

- widowed, including surviving civil partners

In [None]:
groupby_plotsize(client, 'Ethnic_Origin')

In [None]:
groupby_percent(client, 'Ethnic_Origin', 'Serial')

These need to be collapsed. How about:

- White British + White British (English) + White British (Scottish) + Other White + White: Polish + White irish + White British (N.Irish) + White British (Welsh) + White: Gypsy / Traveller
- Other black + Caribbean + African + Black Back Scottish or Black British + African Carribean or Black: Black Black Scottish
- White & Black Caribbean + White & Black Afircan + Other mixed + White and Asian
- Other Asian + Indian + Pakistani + Bangladeshi + Chinese
- Other + Other ethnic group: Arab


ONS uses different for England/Wales/Scotland/NI. Here's England: 
What is your ethnic group?

Choose one option that best describes your ethnic group or background

White

1. English/Welsh/Scottish/Northern Irish/British
2. Irish
3. Gypsy or Irish Traveller
4. Any other White background, please describe

Mixed/Multiple ethnic groups

5. White and Black Caribbean
6. White and Black African
7. White and Asian
8. Any other Mixed/Multiple ethnic background, please describe

Asian/Asian British

9. Indian
10. Pakistani
11. Bangladeshi
12. Chinese
13. Any other Asian background, please describe

Black/ African/Caribbean/Black British

14. African
15. Caribbean
16. Any other Black/African/Caribbean background, please describe

Other ethnic group

17. Arab
18. Any other ethnic group, please describe

### Other variables

In [None]:
groupby_plotsize(client, 'Religion')

In [None]:
groupby_plotsize(client, 'Sexuality')

In [None]:
groupby_plotsize(client, 'NoFixedAbode')

In [None]:
groupby_plotsize(client, 'Alert')

In [None]:
groupby_plotsize(client, 'Scrip_Index')

In [None]:
client.CaseNumber.nunique()

Still haven;t looked at:
- CurrentEpisode',
- 'ProjectCode',
- 'Alert',
- 'CurrentKeyworker',
- 'CaseNumber',
- 'Office'