Approach:
Use the same data as in assignment 1 but this time identify top-10 tokens that occur in regulation descriptions in the table.
1. As in assignment 1, extract regulation descriptions from each record corresponding to a failed inspection
2. Tokenize each regulation description
3. Find top-10 tokens (for the whole table)
4. Clean data: convert to lower case, remove stopwords, punctuation, numbers, etc
5. Find top-10 tokens again
6. Find top-10 tokens after applying Porter stemming to the tokens obtained in step 4.
7. Find top-10 tokens after applying Lancaster stemming to the tokens obtained in step 4.
8. Find top-10 tokens after applying lemmatization to the tokens obtained in step 4.
9. Compare top-10 tokens obtained in 3, 5, 6, 7, 8.

In [52]:
import pandas as pd 
import numpy as np
import re

pd.set_option('display.max_colwidth', None)

## Extract regulation descriptions from each record corresponding to a failed inspection

In [53]:
df = pd.read_csv('Food_Inspections.csv')
df.head()

Unnamed: 0,Inspection ID,DBA Name,AKA Name,License #,Facility Type,Risk,Address,City,State,Zip,Inspection Date,Inspection Type,Results,Violations,Latitude,Longitude,Location
0,2561123,LOWCOUNTRY,LOWCOUNTRY,1042664.0,Restaurant,Risk 1 (High),3343 N CLARK ST,CHICAGO,IL,60657.0,07/21/2022,Complaint,No Entry,,41.942869,-87.652863,"(41.942869318828365, -87.65286280377227)"
1,2560419,CHARTWELLS,MANSUETO HIGH SCHOOL,2549059.0,HIGH SCHOOL KITCHEN,Risk 1 (High),2911 W W 47TH ST,CHICAGO,IL,60632.0,07/07/2022,Canvass,Out of Business,,,,
2,2557095,WOW BAO,WOW BAO,1379974.0,Restaurant,Risk 1 (High),835 N MICHIGAN AVE,CHICAGO,IL,60611.0,06/09/2022,Canvass Re-Inspection,Pass,,41.897741,-87.623961,"(41.897740856252504, -87.62396131598219)"
3,2557044,PAN ARTESANAL,PAN ARTESANAL,2602146.0,Bakery,Risk 1 (High),3724 W FULLERTON AVE,CHICAGO,IL,60647.0,06/09/2022,Canvass,Pass,,41.92467,-87.720445,"(41.92467025197142, -87.72044496440567)"
4,2556917,BISTRO,BISTRO,2846045.0,Restaurant,Risk 1 (High),1400 S JEAN BAPTISTE POINTE DUSABLE LAKESHORE DR,CHICAGO,IL,60605.0,06/07/2022,Canvass,Pass,,,,


In [54]:
df = df[df['Results'] == 'Fail']
df = df[df['Violations'].notna()]
df = df[['Violations']]

In [55]:
def extract_description(t):
    """
    Extracts description for the violation column
    """
    description_regex = r"\s[A-Z \W]+ -"
    descriptions = re.findall(description_regex, t)
    descriptions = [description[1:-2] for description in descriptions]
    return descriptions

df['descriptions'] = df['Violations'].apply(lambda t: extract_description(t))

In [56]:
df.head()

Unnamed: 0,Violations,descriptions
24,"6. PROPER EATING, TASTING, DRINKING, OR TOBACCO USE - Comments: 2-401.11 OBSERVED COFFEE CUPS, PIZZA BOXES AND DOUGHNUTS STORED IN THE MEAT PREP AREA ON THE CUTTING BOARDS AND COUNTERTOPS. MANAGEMENT INSTRUCTED THAT EMPLOYEES MUST EAT AND DRINK IN DESIGNATED AREAS ONLY. | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: 6-301.14 OBSERVED NO HAND WASHING SIGN LOCATED IN THE WASHROOM ADJACENT TO THE MEAT PREP AREA. MANAGEMENT INSTRUCTED TO INSTALL AND MAINTAIN. | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: 5-204.11 OBSERVED NO EXPOSED HAND WASHING SINK LOCATED AT THE FRONT CASH REGISTER STATION WHERE FETA CHEESE IS SERVED OUT OF THE DELI CASE TO CUSTOMERS. MANAGEMENT INSTRUCTED TO INSTALL AN EXPOSED HAND WASHING SINK IN THIS AREA OR RELOCATE THE COOLER WHERE A HAND WASHING SINK IS CLOSE AND ACCESSIBLE. PRIORITY FOUNDATION 7-38-030(C). NO CITATION ISSUED. | 29. COMPLIANCE WITH VARIANCE/SPECIALIZED PROCESS/HACCP - Comments: 3-502.12 VACUUM PACKAGING DEVICE ON COUNTERTOP IN REAR MEAT PREPARATION AREA. UNIT TAGGED 'HELD FOR INSPECTION' AT THIS TIME. MUST OBTAIN CDPH APPROVAL PRIOR TO TAG REMOVAL AND USE OF DEVICE. | 38. INSECTS, RODENTS, & ANIMALS NOT PRESENT - Comments: 6-501.112 OBSERVED DEAD INSECTS ON THE FLOOR OF THE BASEMENT STORAGE AREA. MANAGEMENT INSTRUCTED TO CLEAN AND REMOVE ALL DEAD INSECTS. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: 4-501.12 OBSERVED SOME CUTTING BOARDS IN THE MEAT PREP AND PRODUCE PREP AREA THAT ARE HEAVILY SCORED AND DISCOLORED. MANAGEMENT INSTRUCTED TO RESURFACE OR REPLACE THE CUTTING BOARDS. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: 4-202.16 OBSERVED MILK AND SODA CRATES USED FOR STORAGE IN THE WALK-IN COOLER WITH ACCUMULATED FOOD DEBRIS UNDER THE CRATES. MANAGEMENT INSTRUCTED TO REMOVE AND REPLACE WITH SOMETHING WHICH ALLOWS FOR EASY CLEANING, SUCH AS RAISED SHELVING UNITS. | 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN - Comments: 4-602.13 CLEAN THE TOASTER OVEN IN THE PRODUCE PREP AREA, AND THE FAN COVERS IN THE MEAT WALK-IN COOLER. | 53. TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED - Comments: 6-501.19 THE SELF-CLOSING DOOR DEVICE FOR THE WASHROOM DOOR IS NOT INSTALLED. MANAGEMENT INSTRUCTED TO REPAIR AND MAINTAIN. | 54. GARBAGE & REFUSE PROPERLY DISPOSED; FACILITIES MAINTAINED - Comments: 5-501.110 OBSERVED USED CUTTING BOARDS AND GARBAGE AND LITTER ON THE GROUND SURROUNDING THE GARBAGE DUMPSTERS. MANAGEMENT INSTRUCTED TO CLEAN AND MAINTAIN THE OUTSIDE GARBAGE AREA AT ALL TIMES. PRIORITY FOUNDATION 7-38-020(B). CITATION ISSUED. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: 6-501.114 REMOVE ALL UNNECESSARY ITEMS FROM UNDER THE STAIRCASE TO THE OFFICE AND BASEMENT. STORE ALL ELSE TO PREVENT PEST HARBORAGE AND TO FACILITATE CLEANING. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: 6-501.12 CLEAN THE WALLS IN THE MEAT PREP AREAS AND PRODUCE PREP AREA WITH FOOD DEBRIS AND SPLATTER. CLEAN THE FLOOR THROUGHOUT THE PREP AREAS, WALK-IN COOLERS AND BASEMENT. REMOVE ALL LITTER AND DEBRIS. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: 6-501.16 OBSERVED A WET MOP STORED ON THE FLOOR IN THE PRODUCE PREP AREA. MANAGEMENT INSTRUCTED TO HANG ALL MOPS TO ALLOW THEM TO AIR DRY. | 59. PREVIOUS PRIORITY FOUNDATION VIOLATION CORRECTED - Comments: 8-404.13(B)(3) PREVIOUS PRIORITY FOUNDATION VIOLATION FROM REPORT #2528686 ON 8/27/21 NOT CORRECTED: 38 - OBSERVED OVER 50 LIVE SMALL FLYING INSECTS ON THE WALLS AND FLYING AROUND IN THE PRODUCE PREP AREA, MEAT PREP AREAS AND FRUIT DISPLAY AREAS. ADDITIONAL PEST CONTROL SERVICE IS NEEDED TO ELIMINATE THE PEST ACTIVITY. PRIORITY 7-42-090. CITATION ISSUED. | 60. PREVIOUS CORE VIOLATION CORRECTED - Comments: 8-404.13(B)(4) PREVIOUS CORE VIOLATIONS FROM REPORT #2509827 ON 5/17/21 NOT CORRECTED: 57 - NO PROOF OF FOOD HANDLER TRAINING CERTIFICATE FOR ALL EMPLOYEES HANDLING FOOD. INSTRUCTED TO PROVIDE. PRIORITY FOUNDATION 7-42-090. | 64. MISCELLANEOUS / PUBLIC HEALTH ORDERS - Comments: OBSERVED SEVERAL EMPLOYEES AND CUSTOMERS NOT WEARING A FACE MASK WHILE INSIDE OF THE STORE. MANAGEMENT INSTRUCTED THAT ALL EMPLOYEES AND CUSTOMERS MUST WEAR A FACE MASK WHILE INSIDE OF THE STORE AT ALL TIMES. UNSAFE AND UNSANITARY PREMISES 7-28-060. CITATION ISSUED.","[PROPER EATING, TASTING, DRINKING, OR TOBACCO USE, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, COMPLIANCE WITH VARIANCE/SPECIALIZED PROCESS/HACCP, INSECTS, RODENTS, & ANIMALS NOT PRESENT, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, NON-FOOD/FOOD CONTACT SURFACES CLEAN, TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED, GARBAGE & REFUSE PROPERLY DISPOSED; FACILITIES MAINTAINED, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PREVIOUS PRIORITY FOUNDATION VIOLATION CORRECTED, PREVIOUS CORE VIOLATION CORRECTED, MISCELLANEOUS / PUBLIC HEALTH ORDERS]"
33,"3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING - Comments: OBSERVED NO VERIFIABLE HEALTH POLICY ON SITE AT TIME OF INSPECTION. LEFT TEMPLATE AND INSTRUCTED TO MAINTAIN COPIES OF HEALTH POLICY SIGNED BY ALL EMPLOYEES ON SITE AT ALL TIMES. PRIORITY FOUNDATION VIOLATION 7-38-010. NO CITATION ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments: OBSERVED NO KIT AND PROCEDURE ON SITE FOR RESPONDING TO VOMIT AND DIARRHEA. LEFT ONE PAGER AND INSTRUCTED TO MAINTAIN PROCEDURE AND KIT WITH ALL NECESSARY SUPPLIES INCLUDING A SANITIZER RATED AS EFFECTIVE AGAINST NOROVIRUS ON SITE AT ALL TIMES. PRIORITY FOUNDATION VIOLATION 7-38-005. NO CITATION ISSUED. | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: OBSERVED NO HAND WASHING SIGNAGE AT WASHBOWL SINK LOCATED IN TOILET ROOM. INSTRUCTED TO PROVIDE. | 22. PROPER COLD HOLDING TEMPERATURES - Comments: OBSERVED THE FOLLOWING TCS ITEMS AT THE FOLLOWING IMPROPER TEMPERATURE. IN TWO DRAWER COOLER UNDERNEATH GRILL CHICKEN AT 51.6F, CHICKEN AT 50.1F, AND HAMBURGERS AT 48.9F. IN 3 DOOR COOLER OPPOSITE GRILL ITALIAN BEEF AT 54.9F. MANAGEMENT VOLUNTARILY DISCARDED AND DENATURED APPROXIMATELY 20LBS OF PRODUCT VALUED AT APPROXIMATELY $50. INSTRUCTED TO MAINTAIN ALL TCS ITEMS AT OR BELOW 41F AT ALL TIMES. PRIORITY VIOLATION 7-38-005. CITATION ISSUED. | 33. PROPER COOLING METHODS USED; ADEQUATE EQUIPMENT FOR TEMPERATURE CONTROL - Comments: OBSERVED TWO DRAWER COOLER UNDERNEATH GRILL CONTAINING TCS ITEMS SUCH AS CHICKEN AND HAMBURGERS MAINTAINING AN IMPROPER AMBIENT TEMPERATURE OF 54.1F. OBSERVED 3 DOOR COOLER OPPOSITE GRILL CONTAINING TCS ITEMS SUCH AS ITALIAN BEEF MAINTAINING AN IMPROPER AMBIENT TEMPERATURE OF 50.9F. BOTH COOLERS TAGGED AND HELD FOR INSPECTION. INSTRUCTED TO MAINTAIN ALL COOLERS AT OR BELOW 41F AT ALL TIMES. PRIORITY VIOLATION 7-38-005. CITATION ISSUED. | 36. THERMOMETERS PROVIDED & ACCURATE - Comments: OBSERVED NO THERMOMETERS INSIDE COOLERS ALONG HOT LINE. INSTRUCTED TO PROVIDE. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: OBSERVED BROKEN GASKET AND DOOR THAT FALLS OFF ON 3 DOOR COOLER LOCATED OPPOSITE GRILL. INSTRUCTED TO REPAIR OR REPLACE AND MAINTAIN IN GOOD CONDITION. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: OBSERVED ALL CUTTING BOARDS ALONG HOT LINE COVERED IN DEEP GROOVES AND DARK STAINS. INSTRUCTED TO REPAIR OR REPLACE AND MAINTAIN SMOOTH, CLEANABLE CUTTING SURFACES. | 48. WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS - Comments: OBSERVED MIDDLE COMPARTMENT OF 3 COMPARTMENT SINK UNABLE TO BE FILLED WITH WATER DUE TO NON-FUNCTIONING BUILT IN SINK STOPPER. INSTRUCTED TO REPAIR OR REPLACE SO THAT ALL 3 COMPARTMENTS ARE ABLE TO BE FILLED WITH WATER. | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: OBSERVED NO BACKFLOW PREVENTION DEVICE CONNECTED TO MOP SINK. INSTRUCTED TO INSTALL IN ACCORDANCE WITH ALL RULES, LAWS, AND REGULATIONS. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: OBSERVED LARGE HOLE IN CEILING ABOVE FOOD STORAGE AREA OPPOSITE CUSTOMER SEATING COUNTER. INSTRUCTED TO REPAIR AND MAINTAIN IN GOOD CONDITION. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: OBSERVED BUILD UP OF GREASE, GRIME, AND FOOD DEBRIS UNDERNEATH HOT LINE. INSTRUCTED TO DETAIL CLEAN AND MAINTAIN IN GOOD CONDITION. | 57. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING - Comments: OBSERVED NO FOOD HANDLER TRAINING CERTIFICATES ON SITE AT TIME OF INSPECTION. INSTRUCTED ALL EMPLOYEES SHOULD COMPLETE TRAINING AND MAINTAIN RECORDS THEREOF ON SITE AT ALL TIMES. | 60. PREVIOUS CORE VIOLATION CORRECTED - Comments: OBSERVED THE FOLLOWING REPEAT CORE VIOLATIONS NOT CORRECTED FROM PREVIOUS INSPECTION #2373364 DATED 6/8/2020: #47 ' TORN RUBBER DOOR GASKET ON THE COOKS LINE DRAWER COOLER. MUST REPLACE.' #47 ' REAR PREP CUTTING BOARD IN POOR REPAIR WITH DEEP GROOVES AND CUTS. MUST RESURFACE OR REPLACE.' #58 'NO PROOF OF ALLERGEN TRAINING OR CERTIFICATES FOR ALL THE CITY OF CHICAGO CERTIFIED FOOD MANAGERS.' INSTRUCTED TO CORRECT AND COME INTO COMPLIANCE. PRIORITY FOUNDATION VIOLATION 7-42-090. NO CITATION ISSUED.","[MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING, PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, PROPER COLD HOLDING TEMPERATURES, PROPER COOLING METHODS USED; ADEQUATE EQUIPMENT FOR TEMPERATURE CONTROL, THERMOMETERS PROVIDED & ACCURATE, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING, PREVIOUS CORE VIOLATION CORRECTED]"
51,"3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING - Comments: OBSERVED NO WRITTEN EMPLOYEE HEALTH POLICY ON SITE. INSTRUCTED FACILITY TO ESTABLISH AN APPROPRIATE EMPLOYEE HEALTH POLICY AND MAINTAIN WITH VERIFIABLE SIGNED COPIES ON SITE FOR ALL FOOD EMPLOYEES. PRIORITY FOUNDATION VIOLATION #7-38-010. NO CITATION ISSUED. | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: OBSERVED NO SOAP AT EMPLOYEE WASHROOM HANDSINK, DISHWASHING HANDSINK, AND PREP HANDSINK. HANDWASHING SINKS MUST BE MAINTAINED AT ALL TIMES. PRIORITY FOUNDATION. 7-38-030(C). CITATION ISSUED. | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: OBSERVED NO PAPER TOWELS AT DISH AREA HANDWASHING SINK. INSTRUCTED TO PROVIDE. PRIORITY FOUNDATION. 7-38-030(C). CITATION COMBINED WITH ABOVE VIOLATION. | 22. PROPER COLD HOLDING TEMPERATURES - Comments: OBSERVED 3 CASES OF CHICKEN WINGS INSIDE 2 DOOR REFRIGERATORS AT IMPROPER TEMPERATURES OF 45.5F, 62.2F, AND 47.6F, OBSERVED TURKEY LINKS IN 2 DOOR REFRIGERATOR HOLDING INTEERNAL TEMPERATURE OF 44.4F. INSTRUCTED TO DISCARD APPROXIMATELY 240 POUNDS OF CHICKEN AND TURKEY PRODUCTS AT A COST OF $422. PRIORITY. 7-38-005. CITATION ISSUED. | 23. PROPER DATE MARKING AND DISPOSITION - Comments: OBSERVED READY-TO-EAT, TIME/TEMPERATURE CONTROL FOR SAFETY (TCS) FOODS (COLESLAW SALAD, TURKEY LINKS, AND TURKEY TIPS) NOT DATE MARKED TO INDICATE THE PRODUCT'S NAME AND DATE IN WHICH THE FOOD MUST BE SOLD, DISCARDED OR CONSUMED WITHIN 7 DAYS. INSTRUCTED TO PROVIDE PROPER LABELS FOR DATE MARKING AND DISCARD DATE FOR REFRIGERATED, READY-TO-EAT, TCS FOODS HELD OVER 24HRS PLACED IN ALL COLD-HOLDING UNITS. PRIORITY FOUNDATION. 7-38-005. CITATION ISSUED. | 33. PROPER COOLING METHODS USED; ADEQUATE EQUIPMENT FOR TEMPERATURE CONTROL - Comments: OBSERVED TWO DOOR COOLER AT IMPROPER AMBIENT AIR TEMPERATURE OF 65F. ALL COLD HOLDING EQUIPMENT MUST MAINTAIN TEMPERATURE OF 41F OR BELOW. EQUIPMENT HAS BEEN TAGGED FOR RE-INSPECTION. PRIORITY. 7-38-005. CITATION ISSUED. | 36. THERMOMETERS PROVIDED & ACCURATE - Comments: NO METAL STEM FOOD THERMOMETER ON SITE. INSTRUCTED TO PROVIDE. PRIORITY FOUNDATION. 7-38-005. CITATION ISSUED. | 36. THERMOMETERS PROVIDED & ACCURATE - Comments: NO THERMOMETERS TO MEASURE AMBIENT AIR TEMPERATURE INSIDE COOLERS. INSTRUCTED TO PROVIDE. | 37. FOOD PROPERLY LABELED; ORIGINAL CONTAINER - Comments: OBSERVED BUCKET OF SALT AND PEPPER AND BIN OF FLOUR WITHOUT LABELS. ALL FOODS THAT ARE NOT IN ORIGINAL CONTAINER MUST BE LABELED WITH COMMON NAME. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: OBSERVED NO GASKETS ON TWO 2-DOOR COOLERS. INSTRUCTED TO PROVIDE AND MAINTAIN. | 48. WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS - Comments: NO CHEMICAL SANITIZING TEST STRIPS ON SITE. INSTRUCTED TO PROVIDE PROPER CHEMICAL TEST STRIPS FOR 3 COMPARTMENT SINK. PRIORITY FOUNDATION. 7-38-005. CITATION ISSUED. | 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN - Comments: OBSERVED HEAVY GREASE ACCUMULATION INSIDE MIDDLE FRYER AND FOOD DEBRIS INSIDE COOLERS. MUST CLEAN AND MAINTAIN. | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: OBSERVED BROKEN STOPPER AT WASH COMPARTMENT OF 3 COMPARTMENT SINK AND BROKEN COLD WATER FAUCET AT 3 COMPARTMENT SINK. MUST REPAIR AND MAINTAIN | 53. TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED - Comments: OBSERVED NO COVERED WASTE RECEPTACLE INSIDE EMPLOYEE UNISEX WASHROOM. MUST PROVIDE. | 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED - Comments: OBSERVED GREASE AND DUST ACCUMULATION ON VENTIALTION FILTERS. MUST CLEAN AND MAINTAIN. | 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED - Comments: OBSERVED MISSING LIGHTS INSIDE EMPLOYEE WASHROOM. MUST PROVIDE ADEQUATE LIGHTING AND MAINTAIN | 57. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING - Comments: MUST PROVIDE FOOD HANDLERS TRAINING FOR ALL EMPLOYEES. | 58. ALLERGEN TRAINING AS REQUIRED - Comments: MUST PROVIDE FOOD ALLERGEN TRAINING FOR ALL CERTIFIED FOOD MANAGERS.","[MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, PROPER COLD HOLDING TEMPERATURES, PROPER DATE MARKING AND DISPOSITION, PROPER COOLING METHODS USED; ADEQUATE EQUIPMENT FOR TEMPERATURE CONTROL, THERMOMETERS PROVIDED & ACCURATE, THERMOMETERS PROVIDED & ACCURATE, FOOD PROPERLY LABELED; ORIGINAL CONTAINER, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS, NON-FOOD/FOOD CONTACT SURFACES CLEAN, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED, ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED, ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED, ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING, ALLERGEN TRAINING AS REQUIRED]"
54,"1. PERSON IN CHARGE PRESENT, DEMONSTRATES KNOWLEDGE, AND PERFORMS DUTIES - Comments: PIC DOESN'T HAVE A CERTIFIED FOOD MANAGERS CERTIFICATE.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-012) | 2. CITY OF CHICAGO FOOD SERVICE SANITATION CERTIFICATE - Comments: OBSERVED NO CERTIFIED FOOD MANAGER ON DUTY WHILE TCS FOODS ARE BEING PREPARED,HANDLED AND SERVED SUCH AS (CHICKEN,POTATO,RICE,ETC)MUST BE ON SITE AT ALL TIMES.MANAGER ARRIVED ON SITE AFTER INSPECTION WAS CONDUCTED AT 11:47 A.M.(PRIROITY FOUNDATION 7-38-012)(CITATION ISSUED) | 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING - Comments: Inspector Comments: Violation Codes: 2-102.14(O) Inspector Comments: OBSERVED NO SIGNED EMPLOYEES HEALTH POLICIES.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-010) | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments: Inspector Comments: Violation Codes: 2-501.11 Inspector Comments: OBSERVED NO CLEAN-UP POLICY PROCEDURE AND ITEMS FOR VOMITING AND DIARRHEA.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-005) | 10. ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE - Comments: OBSERVED NO HAND DRYING DEVICES AT HAND SINKS IN PREP ,BAR AREAS & DISH WASHING AREAS. TOWELS WERE PROVIDED DURING INSPECTION.MUST PROVIDE AND MAINTAIN AT ALL TIMES.(PRIROITY FOUNDATION 7-38-030(C)(CITATION ISSUED)(COS) | 25. CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD - Comments: Inspector Comments: Violation Codes: 3-603.11 Inspector Comments: OBSERVED NO CONSUMER FOOD ADVISORY DISCLOSURE NOR REMINDER ON MENU OF CONSUMING RAW AND UNDER COOKED FOODS ON MENU.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-005) | 38. INSECTS, RODENTS, & ANIMALS NOT PRESENT - Comments: Inspector Comments: Violation Codes: 6-202.15 Inspector Comments: OBSERVED AN APPX. '1/2-3/4' GAP IN RIGHT CORNER OF DINING AREA DOOR AND GAP AT BOTTOM OF DOUBLE NORTH DOOR(901).MUST MAKE DOORS TIGHT FITTING. | 45. SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED - Comments: MUST STORE PLASTIC WEAR WITH HANDLES IN UPRIGHT POSITION. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: MUST STORE A LOCK ON DESSERT DISPLAY UNIT ON SALES FLOOR IN DINING AREA. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: Inspector Comments: Violation Codes: 4-501.11 Inspector Comments: MUST APPLY A SEALANT OR PAINT RAW WOOD SHELVING UNITS AT BASEMENT FOOD STORAGE AREA. | 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN - Comments: Inspector Comments: Violation Codes: 4-601.11(C) Inspector Comments: MUST CLEAN DEBRIS FROM FRYER CABINETS,WALK IN COOLER SHELVING UNITS, UNUSED COOLERS AND FREEZERS AND OTHER EQUIPMEN | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: Inspector Comments: Violation Codes: 4-301.16 Inspector Comments: A DUMP SINK IS NEEDED FOR DRINK STATION SERVING AREA IN DINING ROOM.MUST INSTALL AND MAINTAIN. | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: Inspector Comments: Violation Codes: 5-204.12 Inspector Comments: MUST INSTALL A BACK FLOW DEVICE ON ICE MACHINE TO BE SEEN FOR SERVICING. | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: Inspector Comments: Violation Codes: 5-205.15 Inspector Comments: MUST REPAIR OR REPLACE LEAKY STOPPER AT 1 ST FL. 3- COMPARTMENT SINK. | 54. GARBAGE & REFUSE PROPERLY DISPOSED; FACILITIES MAINTAINED - Comments: OBSERVED BOTH OUTSIDE GARBAGE DUMPSTER OVERFLOWING WITH TRASH AND BOXES ABOVE RIMS AND LIDS WIDE OPEN.MUST HAVE LIDS CLOSED AND TIGHT FITTING ON OUTSIDE DUMPSTER AND MAINTAIN.(PRIORITY FOUNDATION 7-38-020(B)(CITATION ISSUED) | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: Inspector Comments: Violation Codes: 6-201.13 Inspector Comments: MUST REPAIR OR REPLACE MISSING AND LOOSE WALL BASES IN REAR PREP AREA, BASEMENT TOILET ROOM,HOLE IN WALL IN 1ST FL. DISH WASHING AREA,OPENINGS AROUND PIPES AT HAND SINK IN PREP AND DISH WASHING AREAS,MISSING ACCESS PANEL NEAR 3- COMPARTMENT SINK IN 1ST FL. DISH WASHING AREA.MUST GROUT WALL OUTLET COVER IN MIDDLE PREP AREA.MUST SEAL GAP AT CONCRETE FLOOR IN BASEMENT IN FRONT OF SMALL BEER WALK IN COOLER AND REAR DINING AREA DOOR.MUST SCRAPE AND PAINT PEELING PAINT ON FLOOR IN BASEMENT TOILET ROOM AND DRY FOOD STORAGE TO BE SMOOTH AND EASILY CLEANABLE. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: Inspector Comments: Violation Codes: 6-501.114 Inspector Comments: MUST ELEVATE,REMOVE AND ORGANIZE ARTICLES OFF OF FLOOR AND AWAY FROM WALLS THROUGHOUT BASEMENT. | 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED - Comments: Inspector Comments: Violation Codes: 6-202.11 Inspector Comments: MUST REPAIR OR REPLACE BURNTOUT LIGHTS AND MISSING LIGHT SHIELDS IN PREP AND DISH WASHING AREAS ALONG WITH END CAPS. | 57. ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING - Comments: OBSERVED NO FOOD HANDLERS TRAINING FOR EMPLOYEES. MUST PROVIDE AND MAINTAIN. | 58. ALLERGEN TRAINING AS REQUIRED - Comments: OBSERVED NO FOOD ALLERGEN TRAINING FOR FOOD MANAGERS. MUST PROVIDE AND MAINTAIN.","[PERSON IN CHARGE PRESENT, DEMONSTRATES KNOWLEDGE, AND PERFORMS DUTIES, CITY OF CHICAGO FOOD SERVICE SANITATION CERTIFICATE, MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPLOYEE; KNOWLEDGE, RESPONSIBILITIES AND REPORTING, PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, CONSUMER ADVISORY PROVIDED FOR RAW/UNDERCOOKED FOOD, INSECTS, RODENTS, & ANIMALS NOT PRESENT, SINGLE-USE/SINGLE-SERVICE ARTICLES: PROPERLY STORED & USED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, NON-FOOD/FOOD CONTACT SURFACES CLEAN, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, GARBAGE & REFUSE PROPERLY DISPOSED; FACILITIES MAINTAINED, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED, ALL FOOD EMPLOYEES HAVE FOOD HANDLER TRAINING, ALLERGEN TRAINING AS REQUIRED]"
65,"1. PERSON IN CHARGE PRESENT, DEMONSTRATES KNOWLEDGE, AND PERFORMS DUTIES - Comments: PIC DOESN'T HAVE A FOOD MANAGERS CERTIFICATE.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-012) | 2. CITY OF CHICAGO FOOD SERVICE SANITATION CERTIFICATE - Comments: OBSERVED NO CERTIFIDE FOOD MANAGERS CERTIFICATE POSTED TO VIEW NOR MANAGER ON DUTY WHILE TCS FOODS ARE BEING PREPARED,HANDLED AND SERVED SUCH AS (LOBSTER,STEAK,MASHED POTATO).MUST PROVIDE ,MAINTAIN AND ON SITE AT ALL TIMES.(PRIORITY FOUNDATION 7-38-012)(CITATION ISSUED) | 21. PROPER HOT HOLDING TEMPERATURES - Comments: OBSERVED IMPROPER TEMPERATURES OF TCS FOODS SUCH AS 18LBS. COOKED MASHED POTATO 113.0F-121.6F.15LBS. COOKED CHICKEN BREAST 94.3F-103.3F. 3LBS. COOKED ASPARUGUS 77.2F.5LBS. COOKED PHILLY STEAK 120.2F.PRODUCT WAS DISCARDED BY MANAGER.MUST HAVE HOT HOLDING FOODS AT 135.0F OR ABOVE. APPX. 41LBS. $62.(PRIORITY 7-38-005)(COS)(CITATION ISSUED) | 22. PROPER COLD HOLDING TEMPERATURES - Comments: OBSERVED IMPROPER TEMPERATURE OF COLD HOLDING FOODS SUCH AS 5.5LBS.RAW LOBSTER TAILS 57.7F-58.3F IN A BUCKET OF WATER ON SHELVING UNIT.PRODUCT WAS DISCARDED BY MANAGER.MUST HAVE COLD HOLDING FOODS AT 41.0F OR BELOW.APPX. 5.5LBS. $160.(PRIROITY 7-38-005)(COS)(CONSOLIDATED VIOLATION) | 36. THERMOMETERS PROVIDED & ACCURATE - Comments: OBSERVED NO PROBE THERMOMETER FOR TAKING FOOD TEMPERATURES. MUST PROVIDE A PROBE THERMOMETER AND MAINTAIN.(PRIORITY FOUNDATION 7-38-005)(CITATION ISSUED) | 37. FOOD PROPERLY LABELED; ORIGINAL CONTAINER - Comments: OBSERVED REPACKAGED DESSERTS ON DISPLAY FOR SALE WITHOUT PROPER INFO.MUST LABEL AND MAINTAIN.MUST LABEL FOOD STORAGE CONTAINERS WHEN FOOD IS NOT IN ORIGINAL PACKAGE. | 39. CONTAMINATION PREVENTED DURING FOOD PREPARATION, STORAGE & DISPLAY - Comments: MUST PROVIDE A SPLASH GUARD AT HAND SINK NEXT TO 3- COMPARTMENT SINK. | 44. UTENSILS, EQUIPMENT & LINENS: PROPERLY STORED, DRIED, & HANDLED - Comments: MUST INVERT MULTI-USE UTENSILS ON DISH STORAGE SHELVES. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: OBSERVED DARK AND BLACK DISCOLORED SURFACE ON CUTTING BOARDS.MUST REPAIR OR REPLACE. | 47. FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED - Comments: MUST REPAIR OR REPLACE WORN DOOR GASKET ON 2- DOOR PREP COOLER.MUST APPLY A SEALANT OR PAINT RAW WOOD FOOD STORAGE SHELVES IN PANTRY. | 48. WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS - Comments: OBSERVED NO CHEMICAL TEST KIT TO CHECK SANITIZING SOLUTION PPM'S AT 3- COMPARTMENT SINK.MUST PROVIDE AND MAINTAIN.(PRIORITY FOUNDATION 7-38-005) | 49. NON-FOOD/FOOD CONTACT SURFACES CLEAN - Comments: OBSERVED GREASE AND FOOD DEBRIS BUILD UP ON FRYER CABINETS,PREP TABLES,PREP COOLERS, REACH IN COOLERS & FREEZER,STORAGE SHELVES. | 50. HOT & COLD WATER AVAILABLE; ADEQUATE PRESSURE - Comments: OBSERVED NO HOT RUNNING WATER ON PREMISES WATER TEMPERATURES ARE REAR HAND SINK 66.0F,3- COMPARTMENT SINK 66.3F,CUSTOMERS TOILET ROOM 66.2F.MUST PROVIDE AT LEAST 110.0F AT HAND SINKS AND 110.0F AT 3- COMPARTMENT SINK.MUST REPAIR AND MAINTAIN.(PRIORITY 7-38-030(C)(CITATION ISSUED) | 51. PLUMBING INSTALLED; PROPER BACKFLOW DEVICES - Comments: MUST REPAIR OR REPLACE LEAKY PIPE AT 3- COMP SINK IN MIDDLE BASIN UNDERNEATH AND LEAKY 2- DOOR REFRIGERATOR. | 53. TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED - Comments: OBSERVED EMPLOYEES TOILET IN REAR PREP AREA OVER FLOWING WITH URINE .MUST REPAIR AND MAINTAIN.(PRIORITY FOUNDATION 7-38-030(C)(CITATION ISSUED) | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: MUST CLEAN DEBRIS BUILD UP FROM WALLS IN PREP AND DISH WASHING AREAS. | 55. PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN - Comments: MUST REPAIR OR REPLACE OPENING IN LOWER WALL UNDER 3- COMPARTMENT SINK AND HAND WASHING SINK IN REAR PREP AREA. | 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED - Comments: MUST REPLACE MISSING FILTER AT VENTILATION HOOD. | 56. ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED - Comments: MUST CLEAN DEBRIS BUILD UP AT HOOD AND FILTERS OF COOKING EQUIPMENT.","[PERSON IN CHARGE PRESENT, DEMONSTRATES KNOWLEDGE, AND PERFORMS DUTIES, CITY OF CHICAGO FOOD SERVICE SANITATION CERTIFICATE, PROPER HOT HOLDING TEMPERATURES, PROPER COLD HOLDING TEMPERATURES, THERMOMETERS PROVIDED & ACCURATE, FOOD PROPERLY LABELED; ORIGINAL CONTAINER, CONTAMINATION PREVENTED DURING FOOD PREPARATION, STORAGE & DISPLAY, UTENSILS, EQUIPMENT & LINENS: PROPERLY STORED, DRIED, & HANDLED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, FOOD & NON-FOOD CONTACT SURFACES CLEANABLE, PROPERLY DESIGNED, CONSTRUCTED & USED, WAREWASHING FACILITIES: INSTALLED, MAINTAINED & USED; TEST STRIPS, NON-FOOD/FOOD CONTACT SURFACES CLEAN, HOT & COLD WATER AVAILABLE; ADEQUATE PRESSURE, PLUMBING INSTALLED; PROPER BACKFLOW DEVICES, TOILET FACILITIES: PROPERLY CONSTRUCTED, SUPPLIED, & CLEANED, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, PHYSICAL FACILITIES INSTALLED, MAINTAINED & CLEAN, ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED, ADEQUATE VENTILATION & LIGHTING; DESIGNATED AREAS USED]"


## Tokenize each regulation description

In [57]:
import nltk
import nltk.corpus  
from nltk.tokenize import word_tokenize
from nltk.text import Text

In [58]:
list_ = np.array(df['descriptions'])
tokens = Text(list_)
tokens = np.concatenate(tokens)
# words = nltk.tokenize.word_tokenize(tokens)
tokens = ", ".join(tokens)
tokens[:200]

'PROPER EATING, TASTING, DRINKING, OR TOBACCO USE, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, ADEQUATE HANDWASHING SINKS PROPERLY SUPPLIED AND ACCESSIBLE, COMPLIANCE WITH VARIANCE/SPE'

In [59]:
words = word_tokenize(tokens)
words[:15]

['PROPER',
 'EATING',
 ',',
 'TASTING',
 ',',
 'DRINKING',
 ',',
 'OR',
 'TOBACCO',
 'USE',
 ',',
 'ADEQUATE',
 'HANDWASHING',
 'SINKS',
 'PROPERLY']

## Find top-10 tokens (for the whole table)

In [60]:
fdist = nltk.FreqDist(words)
print(fdist)

#fdist.items() - will give all words
token_all = fdist.most_common(10)

<FreqDist with 3025 samples and 3730531 outcomes>


## Clean data: convert to lower case, remove stopwords, punctuation, numbers, etc

In [61]:
stopwords = set(nltk.corpus.stopwords.words('english'))
list(stopwords)[:10]

['hers',
 'if',
 'm',
 "that'll",
 'further',
 'after',
 'in',
 'few',
 'once',
 "mightn't"]

In [66]:
words=[word.lower() for word in words if (word.isalpha()) & (word not in stopwords)]
words[:20]

['proper',
 'eating',
 'tasting',
 'drinking',
 'tobacco',
 'use',
 'adequate',
 'handwashing',
 'sinks',
 'properly',
 'supplied',
 'accessible',
 'adequate',
 'handwashing',
 'sinks',
 'properly',
 'supplied',
 'accessible',
 'compliance',
 'insects']

In [67]:
fdist = nltk.FreqDist(words)
print(fdist)

token_preprocessed = fdist.most_common(10)
token_preprocessed

<FreqDist with 2057 samples and 2273832 outcomes>


[('maintained', 83838),
 ('food', 83568),
 ('properly', 66966),
 ('clean', 66375),
 ('constructed', 65594),
 ('equipment', 64487),
 ('installed', 63103),
 ('cleaning', 48289),
 ('surfaces', 47402),
 ('contact', 44559)]

## Find top-10 tokens after applying Porter stemming to the tokens obtained in step 4.

In [69]:
porter = nltk.PorterStemmer()

token_porter = [porter.stem(t) for t in words]
fdist = nltk.FreqDist(token_porter)

token_porter = fdist.most_common(10)
token_porter

[('clean', 140318),
 ('food', 88048),
 ('maintain', 87945),
 ('properli', 66966),
 ('construct', 65594),
 ('equip', 64493),
 ('instal', 63162),
 ('surfac', 47463),
 ('contact', 44560),
 ('method', 40861)]

## Find top-10 tokens after applying Lancaster stemming to the tokens obtained in step 4.

In [71]:
lancaster = nltk.LancasterStemmer()

token_lancaster = [lancaster.stem(t) for t in words]
fdist = nltk.FreqDist(token_lancaster)

token_lancaster = fdist.most_common(10)
token_lancaster

[('cle', 147846),
 ('food', 88048),
 ('maintain', 87945),
 ('prop', 82874),
 ('construct', 65594),
 ('equip', 64493),
 ('instal', 63162),
 ('surfac', 47463),
 ('contact', 44560),
 ('method', 40861)]

## Find top-10 tokens after applying lemmatization to the tokens obtained in step 4.

In [72]:
wnl = nltk.WordNetLemmatizer()

token_lemma = [wnl.lemmatize(t) for t in words]
fdist = nltk.FreqDist(token_lemma)

token_lemma = fdist.most_common(10)
token_lemma

[('food', 88048),
 ('maintained', 83838),
 ('properly', 66966),
 ('clean', 66375),
 ('constructed', 65594),
 ('equipment', 64491),
 ('installed', 63103),
 ('cleaning', 48289),
 ('surface', 47463),
 ('contact', 44559)]

## Compare top-10 tokens obtained in 3, 5, 6, 7, 8.

In [73]:
token_all

[(',', 644165),
 ('AND', 170713),
 (':', 125580),
 ('MAINTAINED', 83838),
 ('FOOD', 83568),
 ('PROPERLY', 66966),
 ('CLEAN', 66375),
 ('CONSTRUCTED', 65594),
 ('EQUIPMENT', 64487),
 ('&', 63424)]

In [74]:
token_preprocessed

[('maintained', 83838),
 ('food', 83568),
 ('properly', 66966),
 ('clean', 66375),
 ('constructed', 65594),
 ('equipment', 64487),
 ('installed', 63103),
 ('cleaning', 48289),
 ('surfaces', 47402),
 ('contact', 44559)]

In [75]:
token_porter

[('clean', 140318),
 ('food', 88048),
 ('maintain', 87945),
 ('properli', 66966),
 ('construct', 65594),
 ('equip', 64493),
 ('instal', 63162),
 ('surfac', 47463),
 ('contact', 44560),
 ('method', 40861)]

In [76]:
token_lancaster

[('cle', 147846),
 ('food', 88048),
 ('maintain', 87945),
 ('prop', 82874),
 ('construct', 65594),
 ('equip', 64493),
 ('instal', 63162),
 ('surfac', 47463),
 ('contact', 44560),
 ('method', 40861)]

In [77]:
token_lemma

[('food', 88048),
 ('maintained', 83838),
 ('properly', 66966),
 ('clean', 66375),
 ('constructed', 65594),
 ('equipment', 64491),
 ('installed', 63103),
 ('cleaning', 48289),
 ('surface', 47463),
 ('contact', 44559)]

# Discussion

## Unprocessed data is containing the punctuations and stopwords that were most frequently used in the claims. After processing the text, we see distinct words that provide some context about the claims. For example, we observe that the top 10 words are generally positive, yet the word "clean" might have been used as "not clean". Therefore, we lack the sufficient evidence to conclude whether the business are good or bad. Since we know these are the descriptions from failed inspections, we can say that there might be problems with the food and maintenance of equipments.



## Using the Porter's Stemmer increased the occurance of clean. This might be the result of changing ["cleaning", "cleaned"] to "clean".


In [87]:
Text(words).concordance('cleaning')

Displaying 25 of 48289 matches:
eaned good repair coving installed cleaning methods used walls ceilings attach
er code good repair surfaces clean cleaning methods lighting required minimum 
d free litter unnecessary articles cleaning equipment properly stored food han
eaned good repair coving installed cleaning methods used walls ceilings attach
er code good repair surfaces clean cleaning methods lighting required minimum 
d free litter unnecessary articles cleaning equipment properly stored sanitizi
eaned good repair coving installed cleaning methods used walls ceilings attach
er code good repair surfaces clean cleaning methods lighting required minimum 
d free litter unnecessary articles cleaning equipment properly stored food ice
eaned good repair coving installed cleaning methods used walls ceilings attach
er code good repair surfaces clean cleaning methods ventilation rooms equipmen
d free litter unnecessary articles cleaning equipment properly stored faciliti
eaned good repair co

## Here we see that, the word "cleaning" was used 48289, so this result proves our hypothesis.

## For the Lancaster's Stemmer, we don't see any distinction between Porter and Lancaster

## After applying the Lemmatizer, we see that the word "food" becomes the top token in our data. This can be a result of changing "foods" to "food"

In [88]:
Text(words).concordance('foods')

Displaying 25 of 4480 matches:
 labeled original container observed foods coolers stored original containers w
labeling must provide name prep date foods stored original containers utensils 
iner must provide identifying labels foods stored original containers inside co
 disclosure reminder menu raw cooked foods provide maintian priority foundation
d manager site potentially hazardous foods prepared served dish machines provid
d manager site potentially hazardous foods prepared served food original contai
d manager site potentially hazardous foods prepared served food original contai
d manager site potentially hazardous foods prepared served food contact surface
ored source sound condition spoilage foods properly labeled shellfish tags plac
d manager site potentially hazardous foods prepared served clean utensils singl
d manager site potentially hazardous foods prepared served dish washing facilit
d manager site potentially hazardous foods prepared served food original contai
d manager

## Here, we can clearly see there were 4480 matches for "foods" which are changed to "food" after applying WordNetLemmatizer