## Initial Data Exploration on 5/25/2020
#### County of Los Angeles Open Data Source
* Dataset #1: https://data.lacounty.gov/Health/LOS-ANGELES-COUNTY-RESTAURANT-AND-MARKET-INSPECTIO/6ni6-h5kp
* Dataset #2: https://data.lacounty.gov/Health/LOS-ANGELES-COUNTY-RESTAURANT-AND-MARKET-VIOLATION/8jyd-4pv9

In [1]:
# Import dependencies
import pandas as pd

In [2]:
# Get datasets using Los Angeles County API
inspections_dataset = pd.read_json("https://data.lacounty.gov/resource/6ni6-h5kp.json")
violations_dataset = pd.read_json("https://data.lacounty.gov/resource/8jyd-4pv9.json")

In [3]:
# Create DataFrame
inspections_df = pd.DataFrame(inspections_dataset)
inspections_df.head()

Unnamed: 0,activity_date,owner_id,owner_name,facility_id,facility_name,record_id,program_name,program_status,program_element_pe,pe_description,...,service_description,score,grade,serial_number,employee_id,geocoded_column,:@computed_region_cbw7_skn5,:@computed_region_pft9_733t,:@computed_region_gj26_y8x3,:@computed_region_x8wy_s94z
0,2018-09-10T00:00:00.000,OW0105348,"GUCKENHEIMER SERVICES, LLC.",FA0242046,SERVERY- NICKELODEON,PR0190194,SERVERY- NICKELODEON,ACTIVE,1635,RESTAURANT (31-60) SEATS HIGH RISK,...,ROUTINE INSPECTION,96,A,DARRFUZBW,EE0000495,"{'type': 'Point', 'coordinates': [-118.314661,...",5.0,1629.0,295.0,20146.0
1,2018-07-19T00:00:00.000,OW0246461,ANASTACIOS POLITIS,FA0252769,TOMS JR BURGERS,PR0202127,TOMS JR BURGERS,ACTIVE,1632,RESTAURANT (0-30) SEATS HIGH RISK,...,ROUTINE INSPECTION,98,A,DA0XQVMTN,EE0001130,"{'type': 'Point', 'coordinates': [-118.292543,...",2.0,846.0,439.0,23668.0
2,2018-08-15T00:00:00.000,OW0010130,DJ BIBINGKAHAN CORPORATION,FA0011237,DJ BIBINGKAHAN,PR0035416,DJ BIBINGKAHAN BAKESHOP,ACTIVE,1631,RESTAURANT (0-30) SEATS MODERATE RISK,...,ROUTINE INSPECTION,98,A,DAMPOJNY8,EE0000500,"{'type': 'Point', 'coordinates': [-117.913926,...",1.0,2226.0,105.0,10148.0
3,2018-07-16T00:00:00.000,OW0020051,KULWINDER KAUR,FA0061073,DOROSE LIQUOR,PR0027907,DOROSE LIQUOR,ACTIVE,1610,"FOOD MKT RETAIL (1-1,999 SF) LOW RISK",...,ROUTINE INSPECTION,91,A,DAUTU3DPD,EE0000045,"{'type': 'Point', 'coordinates': [-118.428399,...",3.0,1571.0,395.0,19730.0
4,2018-09-07T00:00:00.000,OW0246329,JUAN C OROZCO,FA0252595,MEJICO GRILL AND TEQUILLA LOUNGE,PR0201914,MEJICO GRILL AND TEQUILLA LOUNGE,ACTIVE,1641,RESTAURANT (151 + ) SEATS HIGH RISK,...,ROUTINE INSPECTION,90,A,DAUEU4NGF,EE0000526,"{'type': 'Point', 'coordinates': [-118.756808,...",3.0,360.0,1.0,4276.0


In [4]:
# Create DataFrame
violations_df = pd.DataFrame(violations_dataset)
violations_df.head()

Unnamed: 0,serial_number,violation_status,violation_code,violation_description,points
0,DA000211Z,OUT OF COMPLIANCE,F006,# 06. Adequate handwashing facilities supplied...,2
1,DA000211Z,OUT OF COMPLIANCE,F044,"# 44. Floors, walls and ceilings: properly bui...",1
2,DA000211Z,OUT OF COMPLIANCE,F014,# 14. Food contact surfaces: clean and sanitized,2
3,DA000211Z,OUT OF COMPLIANCE,F029,"# 29. Toxic substances properly identified, st...",1
4,DA000211Z,OUT OF COMPLIANCE,F035,# 35. Equipment/Utensils - approved; installed...,1


In [5]:
print(inspections_df.columns)
print(violations_df.columns)

Index(['activity_date', 'owner_id', 'owner_name', 'facility_id',
       'facility_name', 'record_id', 'program_name', 'program_status',
       'program_element_pe', 'pe_description', 'facility_address',
       'facility_city', 'facility_state', 'facility_zip', 'service_code',
       'service_description', 'score', 'grade', 'serial_number', 'employee_id',
       'geocoded_column', ':@computed_region_cbw7_skn5',
       ':@computed_region_pft9_733t', ':@computed_region_gj26_y8x3',
       ':@computed_region_x8wy_s94z'],
      dtype='object')
Index(['serial_number', 'violation_status', 'violation_code',
       'violation_description', 'points'],
      dtype='object')


In [6]:
# What are the data types in the inspections 
print(inspections_df.dtypes)
print(violations_df.dtypes)

activity_date                   object
owner_id                        object
owner_name                      object
facility_id                     object
facility_name                   object
record_id                       object
program_name                    object
program_status                  object
program_element_pe               int64
pe_description                  object
facility_address                object
facility_city                   object
facility_state                  object
facility_zip                    object
service_code                     int64
service_description             object
score                            int64
grade                           object
serial_number                   object
employee_id                     object
geocoded_column                 object
:@computed_region_cbw7_skn5    float64
:@computed_region_pft9_733t    float64
:@computed_region_gj26_y8x3    float64
:@computed_region_x8wy_s94z    float64
dtype: object
serial_numb

In [7]:
# Drop columns from inspections_df
inspections_df = inspections_df.drop(columns = ["activity_date", 
                                                ":@computed_region_cbw7_skn5", 
                                                ":@computed_region_pft9_733t", 
                                                ":@computed_region_gj26_y8x3", 
                                                ":@computed_region_x8wy_s94z"])
inspections_df

Unnamed: 0,owner_id,owner_name,facility_id,facility_name,record_id,program_name,program_status,program_element_pe,pe_description,facility_address,facility_city,facility_state,facility_zip,service_code,service_description,score,grade,serial_number,employee_id,geocoded_column
0,OW0105348,"GUCKENHEIMER SERVICES, LLC.",FA0242046,SERVERY- NICKELODEON,PR0190194,SERVERY- NICKELODEON,ACTIVE,1635,RESTAURANT (31-60) SEATS HIGH RISK,203 W OLIVE AVE # C,BURBANK,CA,91502,1,ROUTINE INSPECTION,96,A,DARRFUZBW,EE0000495,"{'type': 'Point', 'coordinates': [-118.314661,..."
1,OW0246461,ANASTACIOS POLITIS,FA0252769,TOMS JR BURGERS,PR0202127,TOMS JR BURGERS,ACTIVE,1632,RESTAURANT (0-30) SEATS HIGH RISK,1030 W MARTIN LUTHER KING JR BLVD STE 108,LOS ANGELES,CA,90037-1867,1,ROUTINE INSPECTION,98,A,DA0XQVMTN,EE0001130,"{'type': 'Point', 'coordinates': [-118.292543,..."
2,OW0010130,DJ BIBINGKAHAN CORPORATION,FA0011237,DJ BIBINGKAHAN,PR0035416,DJ BIBINGKAHAN BAKESHOP,ACTIVE,1631,RESTAURANT (0-30) SEATS MODERATE RISK,1515 E AMAR RD,WEST COVINA,CA,91792,1,ROUTINE INSPECTION,98,A,DAMPOJNY8,EE0000500,"{'type': 'Point', 'coordinates': [-117.913926,..."
3,OW0020051,KULWINDER KAUR,FA0061073,DOROSE LIQUOR,PR0027907,DOROSE LIQUOR,ACTIVE,1610,"FOOD MKT RETAIL (1-1,999 SF) LOW RISK",13560 ROSCOE BLVD,PANORAMA CITY,CA,91402,1,ROUTINE INSPECTION,91,A,DAUTU3DPD,EE0000045,"{'type': 'Point', 'coordinates': [-118.428399,..."
4,OW0246329,JUAN C OROZCO,FA0252595,MEJICO GRILL AND TEQUILLA LOUNGE,PR0201914,MEJICO GRILL AND TEQUILLA LOUNGE,ACTIVE,1641,RESTAURANT (151 + ) SEATS HIGH RISK,29002 AGOURA RD,AGOURA HILLS,CA,91301,1,ROUTINE INSPECTION,90,A,DAUEU4NGF,EE0000526,"{'type': 'Point', 'coordinates': [-118.756808,..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,OW0126750,RUPINDER KAUR,FA0163076,PRODUCE FOR LESS,PR0151246,PRODUCE FOR LESS,ACTIVE,1615,"FOOD MKT RETAIL (2,000+ SF) HIGH RISK",5059 MELROSE AVE,LOS ANGELES,CA,90038-4115,1,ROUTINE INSPECTION,80,B,DAYNOBKAY,EE0001003,"{'type': 'Point', 'coordinates': [-118.31062, ..."
996,OW0101581,"YOSHINOYA AMERICA, INC.",FA0055749,YOSHINOYA #1128,PR0000189,YOSHINOYA BEEF BOWL REST,ACTIVE,1632,RESTAURANT (0-30) SEATS HIGH RISK,2360 PICO BLVD,SANTA MONICA,CA,90405,1,ROUTINE INSPECTION,93,A,DAX6EQH2R,EE0000999,"{'type': 'Point', 'coordinates': [-118.465831,..."
997,OW0023980,"MCQUAID,MICHAEL AND PATRICK",FA0001328,JIM'S FALLBROOK MARKET,PR0026439,JIM'S FALLBROOK MARKET,ACTIVE,1615,"FOOD MKT RETAIL (2,000+ SF) HIGH RISK",5947 FALLBROOK AVE,WOODLAND HILLS,CA,91367,1,ROUTINE INSPECTION,96,A,DAUNLUNBV,EE0000520,"{'type': 'Point', 'coordinates': [-118.623337,..."
998,OW0129976,OSAMA IBRAHIM,FA0166889,RICK'S MARKET,PR0157225,RICK'S MARKET,ACTIVE,1610,"FOOD MKT RETAIL (1-1,999 SF) LOW RISK",8451 BEVERLY RD,PICO RIVERA,CA,90660-2200,1,ROUTINE INSPECTION,92,A,DANUGOBDL,EE0000799,"{'type': 'Point', 'coordinates': [-118.08959, ..."


In [16]:
unique_owner_ids = pd.unique(inspections_df["owner_id"])
unique_owner_ids

array(['OW0105348', 'OW0246461', 'OW0010130', 'OW0020051', 'OW0246329',
       'OW0123199', 'OW0255633', 'OW0026553', 'OW0012926', 'OW0102114',
       'OW0004005', 'OW0251049', 'OW0040037', 'OW0014320', 'OW0183492',
       'OW0030197', 'OW0019185', 'OW0028928', 'OW0126723', 'OW0236213',
       'OW0018554', 'OW0040162', 'OW0029792', 'OW0021305', 'OW0031821',
       'OW0036820', 'OW0004579', 'OW0185308', 'OW0011794', 'OW0128889',
       'OW0243325', 'OW0013578', 'OW0238416', 'OW0236138', 'OW0126758',
       'OW0242903', 'OW0123582', 'OW0013175', 'OW0029458', 'OW0034906',
       'OW0100835', 'OW0039129', 'OW0100543', 'OW0039092', 'OW0027252',
       'OW0033795', 'OW0178089', 'OW0038181', 'OW0024410', 'OW0030630',
       'OW0251270', 'OW0026360', 'OW0181744', 'OW0034288', 'OW0125100',
       'OW0024981', 'OW0251892', 'OW0013360', 'OW0036987', 'OW0130235',
       'OW0014977', 'OW0012108', 'OW0255323', 'OW0228609', 'OW0237158',
       'OW0245020', 'OW0034460', 'OW0004937', 'OW0251323', 'OW00

In [17]:
len(unique_owner_ids)

902

In [18]:
unique_serial_numbers = pd.unique(inspections_df["serial_number"])
unique_serial_numbers

array(['DARRFUZBW', 'DA0XQVMTN', 'DAMPOJNY8', 'DAUTU3DPD', 'DAUEU4NGF',
       'DARQIUA45', 'DA0JE0XXO', 'DAGKKNDCV', 'DACOHF2H4', 'DACIDHL0U',
       'DABWNUI3O', 'DA30IVVSL', 'DAN40VEF6', 'DAIPLB050', 'DAQHC9UMM',
       'DAERIHGIJ', 'DAR1E1EVY', 'DA0XCDE0H', 'DAPCQ9VAR', 'DAXHN6FYF',
       'DAJNK3M9V', 'DA0RAQH09', 'DADJBKMVN', 'DA3ETVYNG', 'DAGJFPDYA',
       'DA2NXQSSE', 'DAWZLOB5R', 'DATWYNKMQ', 'DAYAKUS0H', 'DAU2N88BR',
       'DASU98I1X', 'DAPJC0PEP', 'DAYTAINGE', 'DAESOZK6V', 'DA0PQO0D7',
       'DAJYQWNT9', 'DA6QUZNUI', 'DAGWEB0YJ', 'DAYNBEAMZ', 'DAFMFZTEM',
       'DAEKZZQ0T', 'DAMFWDZJN', 'DANRLIJF4', 'DAASHXDT2', 'DATLRMXCD',
       'DAO5MWBVG', 'DAOCBEI4O', 'DAXHCPTST', 'DAZKHCE6N', 'DABEETVPE',
       'DAZB7PYMG', 'DACGKM4RT', 'DAPDPBXLG', 'DADXI0TX5', 'DA0MH7IUE',
       'DAFWMG60I', 'DAD0ZDHOE', 'DA46UTS3F', 'DAKBA3CLL', 'DANR5V2IU',
       'DALOW2TPR', 'DAF5MP0BB', 'DA9OPRWPJ', 'DAVAV05FU', 'DAZ6FVFOL',
       'DAF06XHF5', 'DAH0UE2OQ', 'DASTIDGUP', 'DAHPDU61P', 'DABF

In [19]:
len(unique_serial_numbers)

1000

**NOTE**: Use 'serial_numbers' to merge both datasets... set this variable as the index?

**Categorical Data**
* program_status
* service_description
* grade
* violation_status
* violation_code
* violation_description

**Numerical Data**
* score
* points

---> Use summary statistics on numerical data.

**:: LIGHTBULB ::** 
* Do further exploration on 'pe_description'
    * It looks like there's # of seats, restaurant vs. market, risk status (e.g., low/moderate/high risk)... perhaps we can separate these values?
* Separate 'geocoded_column' to get lat & lng values
* Use summary statistics on numerical data
    * Determine average score of chain restaurants
    * Determine average score of local/small/family-owned restaurants

**:: NEXT STEPS ::**
* 'pe_description' and 'geocoded_column' have data for deeper insights --> PARSE THROUGH BOTH COLUMNS :)