### Paycheck Protection Program

Let's analyze **an excerpt** from  <a href="https://www.sba.gov/funding-programs/loans/covid-19-relief-options/paycheck-protection-program">SBA's PPP loan data</a>.

The file we're working on is an excerpt (ingested via a link below) so we can't draw any hypothesis from our findings in this homework.

In [1]:
## import library
import pandas as pd

In [2]:
## Ingest remote data
df = pd.read_csv("https://raw.githubusercontent.com/sandeepmj/datasets/main/ppp_excerpt.csv")
df

Unnamed: 0,LoanNumber,DateApproved,SBAOfficeCode,ProcessingMethod,BorrowerName,BorrowerAddress,BorrowerCity,BorrowerState,BorrowerZip,LoanStatusDate,...,BusinessType,OriginatingLenderLocationID,OriginatingLender,OriginatingLenderCity,OriginatingLenderState,Gender,Veteran,NonProfit,ForgivenessAmount,ForgivenessDate
0,9547507704,05/01/2020,464,PPP,"SUMTER COATINGS, INC.",2410 Highway 15 South,Sumter,,29150-9662,12/18/2020,...,Corporation,19248,Synovus Bank,COLUMBUS,GA,Unanswered,Unanswered,,773553.37,11/20/2020
1,9777677704,05/01/2020,464,PPP,"PLEASANT PLACES, INC.",7684 Southrail Road,North Charleston,,29420-9000,09/28/2021,...,Sole Proprietorship,19248,Synovus Bank,COLUMBUS,GA,Male Owned,Non-Veteran,,746336.24,08/12/2021
2,5791407702,05/01/2020,1013,PPP,BOYER CHILDREN'S CLINIC,1850 BOYER AVE E,SEATTLE,,98112-2922,03/17/2021,...,Non-Profit Organization,9551,"Bank of America, National Association",CHARLOTTE,NC,Unanswered,Unanswered,Y,696677.49,02/10/2021
3,6223567700,05/01/2020,920,PPP,KIRTLEY CONSTRUCTION INC,1661 MARTIN RANCH RD,SAN BERNARDINO,,92407-1740,10/16/2021,...,Corporation,9551,"Bank of America, National Association",CHARLOTTE,NC,Male Owned,Non-Veteran,,395264.11,09/10/2021
4,9662437702,05/01/2020,101,PPP,AERO BOX LLC,,,,,08/17/2021,...,,57328,The Huntington National Bank,COLUMBUS,OH,Unanswered,Unanswered,,370819.35,04/08/2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,3857047706,05/01/2020,1084,PPP,NORTHERN ALASKA CONTRACTORS LLC,3610 MERE CIRCLE,ANCHORAGE,AK,99502,07/13/2021,...,Limited Liability Company(LLC),12096,"Wells Fargo Bank, National Association",SIOUX FALLS,SD,Unanswered,Unanswered,,372875.94,06/22/2021
996,6411717106,04/14/2020,1084,PPP,"L&M EQUIPMENT, INC.",PO BOX 241,NAKNEK,AK,99633,04/27/2021,...,Corporation,3386,First National Bank Alaska,ANCHORAGE,AK,Male Owned,Non-Veteran,,360636.31,12/17/2020
997,6297468301,01/26/2021,1084,PPS,EL CAPITAN LODGE LLC,1 Sarkar Rd.,Sarkar Cove,AK,99921,,...,Limited Liability Company(LLC),59358,"BOKF, National Association",TULSA,OK,Male Owned,Non-Veteran,,372821.37,06/07/2022
998,6226697002,04/06/2020,1084,PPP,WASILLA DRIVE-IN LLC,2051 E. Sun Mountain Ave.,WASILLA,AK,99654-7351,08/19/2021,...,Limited Liability Company(LLC),116975,Northrim Bank,ANCHORAGE,AK,Unanswered,Unanswered,,372314.56,07/22/2021


In [3]:
## get info about data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 53 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   LoanNumber                   1000 non-null   int64  
 1   DateApproved                 1000 non-null   object 
 2   SBAOfficeCode                1000 non-null   int64  
 3   ProcessingMethod             1000 non-null   object 
 4   BorrowerName                 1000 non-null   object 
 5   BorrowerAddress              992 non-null    object 
 6   BorrowerCity                 992 non-null    object 
 7   BorrowerState                987 non-null    object 
 8   BorrowerZip                  992 non-null    object 
 9   LoanStatusDate               916 non-null    object 
 10  LoanStatus                   1000 non-null   object 
 11  Term                         1000 non-null   int64  
 12  SBAGuarantyPercentage        1000 non-null   int64  
 13  InitialApprovalAmou

### ```.agg()```

- Use the ```.agg()``` that allows you call specific summary statistics at one time.

In [4]:
## get the count, mean and median for CurrentApprovalAmount

df["CurrentApprovalAmount"].agg(["count", "mean", "median"])

count       1000.00000
mean      964703.54666
median    631722.27500
Name: CurrentApprovalAmount, dtype: float64

In [7]:
## What summary statistics for the CurrentApprovalAmount by Race?
## if you get scientific notation, round it by running the display option that changes the default display
## run the display code here
pd.options.display.float_format = '{:,.0f}'.format  ## code to format (always a copy and paste) 
df[["InitialApprovalAmount", "CurrentApprovalAmount"]].agg(["count", "mean", "median"])

Unnamed: 0,InitialApprovalAmount,CurrentApprovalAmount
count,1000,1000
mean,952351,964704
median,617999,631722


## ```groupby```

In [11]:

## What is the sum, mean and count for the current 
## approval amount for owners by race 


df.groupby(["Race"])["CurrentApprovalAmount"].agg(["sum", "mean", "count"])

Unnamed: 0_level_0,sum,mean,count
Race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
American Indian or Alaska Native,3123725,520621,6
Asian,9282996,663071,14
Black or African American,5608920,1121784,5
Unanswered,811802131,992423,818
White,134885776,859145,157


## What question(s) might you ask based on this result?

List answer(s):
#### (we discuss these in class)



In [15]:
## What is the sum and mean for the ForgivenessAmount amount for owners by gender
df.groupby(["Gender"])["ForgivenessAmount"].agg(["sum", "mean"])

Unnamed: 0_level_0,sum,mean
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female Owned,62275639,819416
Male Owned,245233157,814728
Unanswered,546130000,978728


In [17]:
## What is the sum, mean and count for the current 
## approval amount for owners by race and gender


df.groupby(["Race", "Gender"])["CurrentApprovalAmount"]\
.agg(["sum", "mean", "count"])

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,mean,count
Race,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
American Indian or Alaska Native,Female Owned,930692,465346,2
American Indian or Alaska Native,Male Owned,2193033,548258,4
Asian,Female Owned,4021749,670292,6
Asian,Male Owned,4661817,665974,7
Asian,Unanswered,599430,599430,1
Black or African American,Male Owned,5608920,1121784,5
Unanswered,Female Owned,33267569,723208,46
Unanswered,Male Owned,156380018,814479,192
Unanswered,Unanswered,622154544,1072680,580
White,Female Owned,24005244,1043706,23


In [18]:
## What is the sum, mean and count for the current 
## approval amount for owners by gender and race

df.groupby([ "Gender", "Race"])["CurrentApprovalAmount"]\
.agg(["sum", "mean", "count"])

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,mean,count
Gender,Race,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female Owned,American Indian or Alaska Native,930692,465346,2
Female Owned,Asian,4021749,670292,6
Female Owned,Unanswered,33267569,723208,46
Female Owned,White,24005244,1043706,23
Male Owned,American Indian or Alaska Native,2193033,548258,4
Male Owned,Asian,4661817,665974,7
Male Owned,Black or African American,5608920,1121784,5
Male Owned,Unanswered,156380018,814479,192
Male Owned,White,96671077,826249,117
Unanswered,Asian,599430,599430,1


## What question(s) might you ask about the unanswered categories?:
List answer(s) here:
#### (we discuss these in class)