## SQL Exploring Non-numeric data

Using statistical functions to explore the data and writing complex queries with temp table saves

- https://github.com/riched158/SQL/blob/master/data/erdiagram.png

Explore help requests submitted to city of Evanston, IL

In [1]:
%load_ext sql

Connect to the empty database made with pgadmin

In [2]:
%sql postgresql://postgres:eric@localhost:5432/analysis

'Connected: postgres@analysis'

### The table

In [12]:
%sql select * from evanston311 limit 2

 * postgresql://postgres:***@localhost:5432/analysis
2 rows affected.


id,priority,source,category,date_created,date_completed,street,house_num,zip,description
1340563,NONE,gov.publicstuff.com,Fire Prevention - Inspection of a Commercial Property,2016-01-13 15:03:18+00:00,2016-01-19 16:51:26+00:00,Sheridan Road,606-612,60202.0,Please contact Debbie at Ext. 222
1826017,MEDIUM,Iframe,Water Service - Question or Concern,2016-08-12 15:35:12+01:00,2016-08-27 08:00:27+01:00,Washington St,930,,"Last spring we called you to report that our sump pump that in the past 50 years has been used to eject laundry water from the basement, was running continuously since February. You came twice to check on it including taking a water sample and 'listening' at the street shut off valve. You did not detect a leak. Since then we have had three plumbers in to look at the problem. We scoped the sewer line, one listened at the interior shut off, and we turned off the building water to see if it affected the pumping. All negative. The sump pump continues to run every 90 seconds 24/7, and we have one flood when the pump was accidentally turned off. This current drought has not affected it either. We are not sure what you can do but we know that we have a constant source of water entering the sump, which one of the plumbers said would probably rule out a sewer line leak. We are a 20 unit condo building. This water is coming from somewhere, but our water bill suggests it is not an internal leak, as well as the other tests. We thought you should know."


### Count the categories

get count of each level of priority

In [5]:
%%sql 
SELECT priority, count(*)
FROM evanston311
GROUP BY priority;

 * postgresql://postgres:***@localhost:5432/analysis
4 rows affected.


priority,count
MEDIUM,5745
NONE,30081
HIGH,88
LOW,517


How many distinct values of zip appear in at least 100 rows?

In [10]:
%%sql
SELECT zip, count(*)
FROM evanston311
GROUP BY zip
HAVING count(*) >= 100

 * postgresql://postgres:***@localhost:5432/analysis
4 rows affected.


zip,count
60208.0,255
,5528
60201.0,19054
60202.0,11165


How many distinct values of source appear in at least 100 rows?

In [14]:
%%sql
SELECT source, count(*)
FROM evanston311
GROUP BY source
HAVING count(*) >= 100

 * postgresql://postgres:***@localhost:5432/analysis
4 rows affected.


source,count
gov.publicstuff.com,30985
Android,444
Iframe,3670
iOS,1199


Select the five most common values of street and the count of each

In [18]:
%%sql
SELECT street, count(*)
FROM evanston311
GROUP BY street
ORDER BY count(*) DESC
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


street,count
,1699
Chicago Avenue,1440
Sherman Avenue,1276
Central Street,1211
Davis Street,1154


### Exploring Text

Trimming
Some of the street values in evanston311 include house numbers with # or / in them. In addition, some street values end in a ..

Remove the house numbers, extra punctuation, and any spaces from the beginning and end of the street values

In [27]:
%%sql
SELECT distinct street,
trim(street, '0123456789 #./.') AS cleaned_street
FROM evanston311
ORDER BY street
limit 5;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


street,cleaned_street
1/2 Chicago Ave,Chicago Ave
1047B Chicago Ave,B Chicago Ave
13th Street,th Street
141A Callan Ave,A Callan Ave
141b Callan Ave,b Callan Ave


Use ILIKE to count rows in evanston311 where the description contains 'trash' or 'garbage' regardless of case

In [28]:
%%sql
SELECT count(*)
FROM evanston311
WHERE description ILIKE '%trash%' 
    OR description ILIKE '%garbage%';

 * postgresql://postgres:***@localhost:5432/analysis
1 rows affected.


count
2551


category values are in title case. Use LIKE to find category values with 'Trash' or 'Garbage' in them

In [33]:
%%sql
SELECT category
FROM evanston311
WHERE category LIKE '%Trash%' 
OR category LIKE '%Garbage%'
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


category
THIS REQUEST IS INACTIVE...Trash Cart - Compost Bin
Trash - Tire Pickup
Trash - Special Pickup - Resident Use
"Trash, Recycling, Yard Waste Cart- Repair/Replacement"
"Trash, Recycling, Yard Waste Cart- Repair/Replacement"


Count rows where the description includes 'trash' or 'garbage' but the category does not.

In [34]:
%%sql
SELECT count(*)
FROM evanston311 
WHERE (description ILIKE '%trash%'
    OR description ILIKE '%garbage%') 
   AND category NOT LIKE '%Trash%'
   AND category NOT LIKE '%Garbage%';

 * postgresql://postgres:***@localhost:5432/analysis
1 rows affected.


count
570


Find the most common categories for rows with a description about trash that don't have a trash-related category

In [36]:
%%sql
SELECT category, count(*)
  FROM evanston311 
 WHERE (description ILIKE '%trash%'
    OR description ILIKE '%garbage%') 
   AND category NOT LIKE '%Trash%'
   AND category NOT LIKE '%Garbage%'
 GROUP BY category
 ORDER BY count(*) DESC
 LIMIT 10;

 * postgresql://postgres:***@localhost:5432/analysis
10 rows affected.


category,count
Ask A Question / Send A Message,273
Rodents- Rats,77
Recycling - Missed Pickup,28
Dead Animal on Public Property,16
Graffiti,15
Yard Waste - Missed Pickup,14
Public Transit Agency Issue,13
Food Establishment - Unsanitary Conditions,13
Exterior Conditions,10
Street Sweeping,9


### Strings

House number (house_num) and street are in two separate columns in evanston311. Concatenate them together with concat() with a space in between the values.

In [38]:
%%sql
SELECT ltrim(concat(house_num, ' ', street)) AS address
FROM evanston311
limit 5;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


address
606-612 Sheridan Road
930 Washington St
1183-1223 Lincoln St
1–111 Callan Ave
1524 Crain St


Split strings on a delimiter
The street suffix is the part of the street name that gives the type of street, such as Avenue, Road, or Street. In the Evanston 311 data, sometimes the street suffix is the full word, while other times it is the abbreviation.

Extract just the first word of each street value to find the most common streets regardless of the suffix.

In [39]:
%%sql
SELECT split_part(street, ' ', 1) AS street_name, 
       count(*)
FROM evanston311
GROUP BY street_name
ORDER BY count DESC
LIMIT 20;

 * postgresql://postgres:***@localhost:5432/analysis
20 rows affected.


street_name,count
,1699
Chicago,1569
Central,1529
Sherman,1479
Davis,1248
Church,1225
Main,880
Sheridan,842
Ridge,823
Dodge,816


Select the first 50 characters of description when description starts with the word "I".

In [41]:
%%sql
SELECT CASE WHEN length(description) > 50
            THEN left(description, 50) || '...'
       ELSE description
       END
FROM evanston311
WHERE description LIKE 'I %'
ORDER BY description
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/analysis
10 rows affected.


description
I work for Schermerhorn & Co. and manage this con...
I accidentally mistyped my license plate number - ...
I accidentally sent the wrong cover letter on my a...
I acquired c diff at north shore hospital in Evans...
I am a 35 year resident of Evanston (314 Custer Av...
I am a business owner at 1121 Emerson St at the co...
I am a Cubs fan and watched game seven. But using ...
"I am a current customer at 1333 Maple Ave, Unit 2E..."
I am a day care worker at the family center at the...
I am a Northwestern student that has accumulated t...
