## Converting Data

- Convert data stored in a `PostgreSQL` database from one data type to another. 
- Explore the expressions needed to convert `text` to `numeric` types and how to format `strings` for `temporal` data.

- **Note:**
    - 12AM is midnight
    - 12PM is noon


In [2]:
cursor.execute("""SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'""")
print('Table in Database:\n')
for table in cursor.fetchall():
       print(table)

Table in Database:

('film_permit',)
('parking_violation',)


### Type conversion with a CASE clause
- One of the `parking_violation` attributes included for each record is the vehicle's location with respect to the street address of the violation. 
    - An `F` value in the `violation_in_front_of_or_opposite` column indicates the vehicle was in front of the recorded address. 
    - A `O` value indicates the vehicle was on the opposite side of the street. The column uses the `TEXT` type to represent the column values. 
    - The same information could be captured using a `BOOLEAN` (true/false) value which uses less memory.

- **Task:** convert `violation_in_front_of_or_opposite` to a `BOOLEAN` column named `is_violation_in_front` using a CASE clause.
    - `true` for records that occur in front of the recorded address and 
    - `false` for records that occur opposite of the recorded address.

In [3]:
%%sql
SELECT
    CASE WHEN
            -- Use true when column value is 'F'
            violation_in_front_of_or_opposite = 'F' THEN true
         WHEN
            -- Use false when column value is 'O'
            violation_in_front_of_or_opposite = 'O' THEN false
         ELSE
            NULL
    END AS is_violation_in_front
FROM
  parking_violation
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


is_violation_in_front
True
True
True
False
True
True
True
True
""
""


### Applying aggregate functions to converted values
- converting a column's value from TEXT to a number allows for calculations to be performed using aggregation functions. 
- The summons_number is of type TEXT in the `parking_violation dataset`. 
- The maximum (using `MAX(summons_number)`) and minimum (using `MIN(summons_number)`) of the TEXT representation summons_number can be calculated. 
- However, the size of the range (max - min) of summon_number values is not possible to calculated because the operation of subtraction on `TEXT` types is not defined. 

>=> converting `summons_number` to a `BIGINT` will resolve this problem.



In [4]:

%%sql
SELECT
        MAX(summons_number::BIGINT) - MIN(summons_number::BIGINT) AS range_size
FROM
      parking_violation;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
1 rows affected.


range_size
2954656568


### Date parsing and formatting
#### Cleaning invalid dates
- The `date_first_observed` column in the `parking_violation` dataset represents the date when the parking violation was first observed by the individual recording the violation. 
- But not all `date_first_observed` values were recorded properly. Some records contain a `0` value for this column which cannot be interpreted as a `DATE` automatically as its meaning in this context is ambiguous. The column values require cleaning to enable conversion to a proper DATE column.

 > => convert the `date_first_observed` value of records with a `0` in `date_first_observed` value into `NULL` 

In [5]:
%%sql
SELECT date_first_observed, count(*)
FROM parking_violation
GROUP BY date_first_observed
ORDER BY count DESC

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
33 rows affected.


date_first_observed,count
0,4919
20190707,7
20190625,6
20190702,6
20190621,6
20190701,4
20190614,4
20190628,3
20190706,3
20190630,3


In [6]:
%%sql
With sub AS (SELECT
   DATE(NULLIF(date_first_observed, '0')) AS date_first_observed
FROM
   parking_violation)

SELECT date_first_observed, COUNT(*) AS count 
FROM sub 
GROUP BY date_first_observed
ORDER BY count DESC


 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
33 rows affected.


date_first_observed,count
,4919
2019-07-07,7
2019-07-02,6
2019-06-25,6
2019-06-21,6
2019-06-14,4
2019-07-01,4
2019-06-26,3
2019-06-30,3
2019-06-28,3


In [7]:
%%sql
SELECT
    summons_number,
    DATE(issue_date) AS issue_date,
    DATE(NULLIF(date_first_observed,'0')) AS date_first_observed
FROM
  parking_violation
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


summons_number,issue_date,date_first_observed
1447152396,2019-06-28,
1447152554,2019-06-16,
1447295250,2019-06-30,
1447153340,2019-06-28,
1447153352,2019-07-06,
1447153790,2019-06-16,
1447153819,2019-06-15,
1447153820,2019-06-23,
1447153844,2019-06-30,
1413267488,2019-04-30,


In [8]:
%%sql
SELECT
  summons_number,
    TO_CHAR(issue_date, 'YYYYMMDD') AS issue_date,
    TO_CHAR(date_first_observed, 'YYYYMMDD') AS date_first_observed
FROM (
  SELECT
    summons_number,
    DATE(issue_date) AS issue_date,
    DATE(NULLIF(date_first_observed,'0')) AS date_first_observed
  FROM
    parking_violation
) sub

LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


summons_number,issue_date,date_first_observed
1447152396,20190628,
1447152554,20190616,
1447295250,20190630,
1447153340,20190628,
1447153352,20190706,
1447153790,20190616,
1447153819,20190615,
1447153820,20190623,
1447153844,20190630,
1413267488,20190430,


### Timestamp parsing and formatting


In [9]:
%%sql
SELECT 
    summons_number, violation_time,
    --CONCAT(12,substr(violation_time,3,2),'P') AS violation_time1
    CONCAT(12,substr(violation_time,3,2),'P') AS violation_time2
  
FROM parking_violation
WHERE --violation_time NOT SIMILAR TO '00%P'  AND
     --violation_time SIMILAR TO '00%A' AND
     violation_time SIMILAR TO '2%'
ORDER BY summons_number
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
0 rows affected.


summons_number,violation_time,violation_time2


In [10]:
%%sql
 WITH sub AS (SELECT 
    summons_number, violation_time,
    CONCAT(12,substr(violation_time,3,2),'P') AS violation_time1,
    CONCAT(12,substr(violation_time,3,2),'P') AS violation_time2
FROM parking_violation
WHERE --violation_time NOT SIMILAR TO '00%P'  AND
      violation_time SIMILAR TO '00%A'
      --violation_time SIMILAR TO '2%'
             )

--UPDATE parking_violation pv
--SET 
--    violation_time = violation_time1
--    --violation_time = violation_time2
--FROM sub
--
--WHERE pv.summons_number = sub.summons_number;

SELECT summons_number, violation_time
FROM parking_violation
WHERE summons_number LIKE '1309081189'
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
1 rows affected.


summons_number,violation_time
1309081189,1250PM


In [11]:
%%sql
SELECT 
    violation_time,
    violation_time1
FROM
        (SELECT 
            summons_number, violation_time,
            CONCAT(violation_time,'M') AS violation_time1
        FROM parking_violation
        WHERE violation_time SIMILAR TO '%[A-Z]' 
            
        )sub
        
 WHERE violation_time1 SIMILAR TO '00%'
    OR violation_time SIMILAR TO '1[3-9]%'
    OR violation_time SIMILAR TO '2[0-4]%';

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
0 rows affected.


violation_time,violation_time1


In [12]:
%%sql
SELECT
    violation_time,
    TO_TIMESTAMP(violation_time1, 'HH12MIPM')::TIME AS violation_time1
FROM   (SELECT 
            summons_number,
            violation_time,
            CONCAT(violation_time,'M') AS violation_time1
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN violation_time ='0059P' THEN '1259P'
                    WHEN violation_time ='1955P' THEN '0755P'
                    WHEN violation_time ='1634P' THEN '0434P'
                    ELSE violation_time END AS violation_time
            FROM 
               parking_violation)sub1
        WHERE violation_time SIMILAR TO '%[A-Z]' 
              AND violation_time IS NOT NULL) sub2
WHERE violation_time1 IS NOT NULL
LIMIT 20;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
20 rows affected.


violation_time,violation_time1
1000AM,10:00:00
0107AM,01:07:00
0410AM,04:10:00
0935AM,09:35:00
1217PM,12:17:00
0124AM,01:24:00
0500AM,05:00:00
1221AM,00:21:00
1241AM,00:41:00
1110AM,11:10:00


In [13]:
%%sql

WITH sub AS (

SELECT 
            summons_number,
            violation_time,
        CONCAT(violation_time,'M') AS violation_time1
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN violation_time ='0059P' THEN '1259P'
                    WHEN violation_time ='1955P' THEN '0755P'
                    WHEN violation_time ='1634P' THEN '0434P'
                    ELSE violation_time END AS violation_time
            FROM 
               parking_violation)sub1
        WHERE violation_time SIMILAR TO '%[A-Z]' 
              AND violation_time IS NOT NULL
)

--UPDATE parking_violation pv
--
--SET violation_time =violation_time1
--FROM sub
--WHERE pv.summons_number=sub.summons_number;
--
SELECT violation_time FROM parking_violation LIMIT 10

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


violation_time
1000AM
0107AM
0410AM
0935AM
1217PM
0124AM
0500AM
1221AM
1241AM
0108


In [14]:
%%sql
SELECT violation_time FROM parking_violation
WHERE NOT violation_time LIKE '%M'

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
3 rows affected.


violation_time
108
855
1011


In [15]:
%%sql
SELECT
  -- Populate column with violation_time hours
  EXTRACT('hour' FROM violation_time) AS hour,
  COUNT(*)
FROM (
    SELECT
      TO_TIMESTAMP(violation_time1, 'HH12MIPM')::TIME as violation_time
    FROM
      (SELECT 
            summons_number,
            violation_time,
            CONCAT(violation_time,'M') AS violation_time1
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN violation_time ='0059P' THEN '1259P'
                    WHEN violation_time ='1955P' THEN '0755P'
                    WHEN violation_time ='1634P' THEN '0434P'
                    ELSE violation_time END AS violation_time
            FROM 
               parking_violation)sub1
        WHERE violation_time SIMILAR TO '%[A-Z]' 
              AND violation_time IS NOT NULL) sub2
    WHERE
      violation_time IS NOT NULL
) sub
GROUP BY
  hour
ORDER BY
  hour

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
24 rows affected.


hour,count
0.0,136
1.0,242
2.0,214
3.0,149
4.0,150
5.0,122
6.0,107
7.0,145
8.0,270
9.0,319


### A parking violation report by day of the month
- parking tickets are more likely to be given out at the end of the month compared to during the month, 
- preparing data to get a sense of the distribution of tickets by day of the month. 
    - convert the strings representing the `issue_date` into proper PostgreSQL DATE values. 
    - extract the day of the month required to produce the distribution of violations by month day.

In [16]:
%%sql
SELECT
  -- Convert issue_date to a DATE value
  issue_date::DATE AS issue_date
FROM
  parking_violation
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


issue_date
2019-06-28
2019-06-16
2019-06-30
2019-06-28
2019-07-06
2019-06-16
2019-06-15
2019-06-23
2019-06-30
2019-04-30


In [17]:
%%sql

SELECT
  -- Create issue_day from the day value of issue_date
  EXTRACT(DAY FROM issue_date) AS issue_day,
  -- Include the count of violations for each day
  COUNT(*)
FROM (
  SELECT
    -- Convert issue_date to a `DATE` value
    DATE(issue_date) AS issue_date
  FROM
    parking_violation
) sub
GROUP BY
  issue_day
ORDER BY
  issue_day;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
31 rows affected.


issue_day,count
1.0,153
2.0,161
3.0,154
4.0,198
5.0,205
6.0,171
7.0,148
8.0,123
9.0,120
10.0,86


### Risky parking behavior
- The `parking_violation` table contains many parking violation details. However, it's unclear what causes an individual to violate parking restrictions. One hypothesis is that violators attempt to park in restricted areas just before the parking restrictions end.

- convert violation_time, and `to_hours_in_effect` to `TIMESTAMP` values for violations that took place in locations with partial day restrictions, 
- calculate the interval between the `violation_time` and `to_hours_in_effect` for these records, and 
- identify the records where the `violation_time` is less than 1 hour before `to_hours_in_effect`.

In [42]:
%%sql
SELECT to_hours_in_effect, COUNT(*) 
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN to_hours_in_effect ='0000' THEN '1200A'
                    WHEN to_hours_in_effect ='ALL' THEN '1159P'
                    WHEN to_hours_in_effect ='1600P' THEN '0400P'
                    WHEN to_hours_in_effect ='1800P' THEN '0600P'
                    WHEN to_hours_in_effect ='2000P' THEN '0800P'
                    --WHEN length(to_hours_in_effect) != 6 THEN NULL 
                    ELSE to_hours_in_effect END AS to_hours_in_effect
            FROM 
               parking_violation)sub1
GROUP BY to_hours_in_effect
ORDER BY to_hours_in_effect

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
34 rows affected.


to_hours_in_effect,count
0100PM,41
0130PM,8
0200PM,29
0300AM,14
0300PM,1
0400AM,3
0400PM,23
0500AM,8
0500PM,12
0600AM,57


In [30]:
%%sql
SELECT DISTINCT to_hours_in_effect
FROM parking_violation
WHERE  to_hours_in_effect SIMILAR TO '00%'
    OR to_hours_in_effect SIMILAR TO '1[3-9]%'
    OR to_hours_in_effect SIMILAR TO '2[0-4]%';


 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
3 rows affected.


to_hours_in_effect
1600PM
1800PM
2000PM


In [45]:
%%sql
WITH SUB AS (SELECT 
            summons_number,
            to_hours_in_effect,
             CASE WHEN to_hours_in_effect SIMILAR TO '%[A_Z]' 
                    AND to_hours_in_effect NOT SIMILAR TO '%M' THEN CONCAT(to_hours_in_effect,'M') 
                  ELSE to_hours_in_effect
             END AS to_hours_in_effect1
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN to_hours_in_effect ='0000' THEN '1159P'
                    WHEN to_hours_in_effect ='ALL' THEN '1159P'
                    WHEN to_hours_in_effect ='1600P' THEN '0400P'
                    WHEN to_hours_in_effect ='1800P' THEN '0600P'
                    WHEN to_hours_in_effect ='2000P' THEN '0800P'
                    --WHEN length(to_hours_in_effect) != 5 THEN NULL
                    ELSE to_hours_in_effect END AS to_hours_in_effect
            FROM 
               parking_violation)sub1
        WHERE  to_hours_in_effect IS NOT NULL)

UPDATE parking_violation pv

SET to_hours_in_effect=to_hours_in_effect1
FROM SUB
WHERE pv.summons_number=SUB.summons_number;

SELECT to_hours_in_effect, COUNT(*) FROM parking_violation
GROUP BY to_hours_in_effect
ORDER BY to_hours_in_effect

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
5000 rows affected.
30 rows affected.


to_hours_in_effect,count
0100PM,41
0130PM,8
0200PM,29
0300AM,14
0300PM,1
0400AM,3
0400PM,26
0500AM,8
0500PM,12
0600AM,57


In [47]:
%%sql
SELECT DISTINCT from_hours_in_effect
FROM parking_violation
WHERE  from_hours_in_effect SIMILAR TO '00%'
    OR from_hours_in_effect SIMILAR TO '1[3-9]%'
    OR from_hours_in_effect SIMILAR TO '2[0-4]%'
    OR from_hours_in_effect NOT SIMILAR TO '0%';


 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
15 rows affected.


from_hours_in_effect
ALL
1159P
0000
1200A
1136A
1030A
1130
1100P
1100A
1130A


In [55]:
%%sql
SELECT 
                summons_number,
                from_hours_in_effect
            FROM 
               parking_violation
            WHERE length(from_hours_in_effect) != 6

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
5 rows affected.


summons_number,from_hours_in_effect
1422683254,800
1454070122,800
1454093195,1130
1454169126,700
1452145076,800


In [61]:
%%sql
WITH SUB AS (SELECT 
            summons_number,
            from_hours_in_effect,
             CASE WHEN from_hours_in_effect SIMILAR TO '%[A_P]' AND
                       from_hours_in_effect NOT SIMILAR TO '%M' THEN CONCAT(from_hours_in_effect,'M') 
                  ELSE from_hours_in_effect
             END AS from_hours_in_effect1
        FROM 
            (SELECT 
                summons_number,
                CASE 
                    WHEN from_hours_in_effect ='0000' THEN '1200AM'
                    WHEN from_hours_in_effect ='ALL' THEN '1200AM'
                    WHEN length(from_hours_in_effect) != 6 THEN NULL
                    ELSE from_hours_in_effect END AS from_hours_in_effect
            FROM 
               parking_violation)sub1
        --WHERE  from_hours_in_effect IS NOT NULL
            )

UPDATE parking_violation pv

SET from_hours_in_effect=from_hours_in_effect1
FROM SUB
WHERE pv.summons_number=SUB.summons_number;

SELECT from_hours_in_effect, COUNT(*) FROM parking_violation
GROUP BY from_hours_in_effect
ORDER BY from_hours_in_effect

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
5000 rows affected.
28 rows affected.


from_hours_in_effect,count
0100AM,30
0200PM,3
0300AM,11
0400PM,5
0500PM,1
0600AM,7
0700AM,152
0700PM,1
0730AM,36
0800AM,68


In [82]:
%%sql


SELECT 
    TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
    TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
    TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time,
    COUNT(*)
FROM parking_violation
WHERE NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect ='1159PM')
GROUP BY from_hours_in_effect,to_hours_in_effect,violation_time

ORDER BY from_hours_in_effect,to_hours_in_effect,violation_time, COUNT(*)
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


from_hours_in_effect,to_hours_in_effect,violation_time,count
00:00:00,03:00:00,00:35:00,1
00:00:00,03:00:00,00:46:00,1
00:00:00,03:00:00,00:58:00,1
00:00:00,03:00:00,01:00:00,1
00:00:00,03:00:00,01:02:00,1
00:00:00,03:00:00,01:03:00,1
00:00:00,03:00:00,01:07:00,1
00:00:00,03:00:00,01:16:00,1
00:00:00,03:00:00,01:17:00,1
00:00:00,03:00:00,01:20:00,1


In [223]:
%%sql
SELECT
          summons_number,
          from_hours_in_effect,
          violation_time,  
          to_hours_in_effect,
        
          CASE WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)
                
                WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)
                ELSE 0 
                END AS hours,
          
         CASE WHEN to_hours_in_effect >= violation_time THEN
                    EXTRACT('minute' FROM to_hours_in_effect - violation_time) 
               ELSE EXTRACT('minute' FROM to_hours_in_effect - violation_time) + 60
               END AS minutes
    FROM (
          SELECT
            summons_number,
                TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
                TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
                TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time
          FROM
            parking_violation
          WHERE
            NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect = '1159PM') )sub

WHERE violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
        AND to_hours_in_effect > from_hours_in_effect


 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
10 rows affected.


summons_number,from_hours_in_effect,violation_time,to_hours_in_effect,hours,minutes
1439394799,07:00:00,06:40:00,19:00:00,0.0,20.0
1446646427,00:00:00,21:35:00,12:00:00,0.0,25.0
1454093201,11:30:00,00:10:00,13:00:00,0.0,50.0
1454170517,07:00:00,06:46:00,19:00:00,0.0,14.0
1451849291,00:01:00,12:43:00,12:00:00,0.0,17.0
1452145155,08:30:00,08:20:00,09:00:00,0.0,40.0
1452145600,11:30:00,00:07:00,13:00:00,0.0,53.0
1452145611,11:30:00,00:40:00,13:00:00,0.0,20.0
1452145623,11:30:00,00:45:00,13:00:00,0.0,15.0
1413267488,07:00:00,01:08:00,19:00:00,0.0,52.0


In [501]:
%%sql
SELECT
        summons_number,
        from_hours_in_effect,
        violation_time,  
        to_hours_in_effect,
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '05:00:00'::time                  
                    THEN  24- EXTRACT('hour' FROM to_hours_in_effect) + EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                    THEN (- EXTRACT('hour' FROM violation_time ) + EXTRACT('hour' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (24- EXTRACT('hour' FROM violation_time )+EXTRACT('hour' FROM to_hours_in_effect) )
            ELSE EXTRACT('hour' FROM to_hours_in_effect -  violation_time)     
            END AS hours,
        
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('minute' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '06:00:00'::time                  
                THEN  EXTRACT('minute' FROM to_hours_in_effect) + EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                   THEN (- EXTRACT('minute' FROM violation_time ) + EXTRACT('minute' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (-EXTRACT('minute' FROM violation_time )+EXTRACT('minute' FROM to_hours_in_effect) )
            ELSE EXTRACT('minute' FROM to_hours_in_effect -  violation_time)
            END AS minutes          
    FROM 
        (SELECT
            summons_number,
                TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
                TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
                TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time
          FROM
            parking_violation
          WHERE NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect = '1159PM')
        ) sub 


LIMIT 15;



 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
15 rows affected.


summons_number,from_hours_in_effect,violation_time,to_hours_in_effect,hours,minutes
1447634159,20:00:00,22:22:00,05:00:00,7.0,-22.0
1447672290,22:00:00,03:50:00,07:00:00,4.0,-50.0
1447680091,22:00:00,04:40:00,07:00:00,3.0,-40.0
1447680133,22:00:00,04:12:00,07:00:00,3.0,-12.0
1447680194,22:00:00,04:30:00,07:00:00,3.0,-30.0
1422683254,,08:30:00,10:30:00,2.0,0.0
1434758126,07:00:00,08:33:00,19:00:00,10.0,27.0
4011361938,,20:34:00,10:30:00,-10.0,-4.0
4011361975,,20:48:00,10:30:00,-10.0,-18.0
4011362750,,10:20:00,10:30:00,0.0,10.0


###  The records where the violation_time is less than 1 hour before `to_hours_in_effect`.

In [519]:
%%sql

With diff_time AS (
SELECT
        summons_number,
        from_hours_in_effect,
        violation_time,  
        to_hours_in_effect,
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '05:00:00'::time                  
                    THEN  24- EXTRACT('hour' FROM to_hours_in_effect) + EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                    THEN (- EXTRACT('hour' FROM violation_time ) + EXTRACT('hour' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (24- EXTRACT('hour' FROM violation_time )+EXTRACT('hour' FROM to_hours_in_effect) )
            ELSE EXTRACT('hour' FROM to_hours_in_effect -  violation_time)     
            END AS hours,
        
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('minute' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '06:00:00'::time                  
                THEN  EXTRACT('minute' FROM to_hours_in_effect) + EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                   THEN (- EXTRACT('minute' FROM violation_time ) + EXTRACT('minute' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (-EXTRACT('minute' FROM violation_time )+EXTRACT('minute' FROM to_hours_in_effect) )
            ELSE EXTRACT('minute' FROM to_hours_in_effect -  violation_time)
            END AS minutes          
    FROM 
        (SELECT
            summons_number,
                TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
                TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
                TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time
          FROM
            parking_violation
          WHERE NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect = '1159PM')
        ) sub 
)

SELECT count(hours) AS violation_before_to_hours
FROM diff_time
WHERE
    hours*60 + minutes >  0
AND 
    hours*60 + minutes < 60



 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
1 rows affected.


violation_before_to_hours
201


In [515]:
%%sql

With diff_time AS (
SELECT
        summons_number,
        from_hours_in_effect,
        violation_time,  
        to_hours_in_effect,
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '05:00:00'::time                  
                    THEN  24- EXTRACT('hour' FROM to_hours_in_effect) + EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                    THEN (- EXTRACT('hour' FROM violation_time ) + EXTRACT('hour' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (24- EXTRACT('hour' FROM violation_time )+EXTRACT('hour' FROM to_hours_in_effect) )
            ELSE EXTRACT('hour' FROM to_hours_in_effect -  violation_time)     
            END AS hours,
        
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('minute' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '06:00:00'::time                  
                THEN  EXTRACT('minute' FROM to_hours_in_effect) + EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                   THEN (- EXTRACT('minute' FROM violation_time ) + EXTRACT('minute' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (-EXTRACT('minute' FROM violation_time )+EXTRACT('minute' FROM to_hours_in_effect) )
            ELSE EXTRACT('minute' FROM to_hours_in_effect -  violation_time)
            END AS minutes          
    FROM 
        (SELECT
            summons_number,
                TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
                TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
                TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time
          FROM
            parking_violation
          WHERE NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect = '1159PM')
        ) sub 
)

SELECT summons_number, from_hours_in_effect, violation_time, to_hours_in_effect, hours, minutes
FROM diff_time
WHERE hours * 60 +minutes <60
AND hours * 60 +minutes >0
ORDER BY hours, minutes

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
201 rows affected.


summons_number,from_hours_in_effect,violation_time,to_hours_in_effect,hours,minutes
1452145880,07:30:00,07:58:00,08:00:00,0.0,2.0
1452145714,07:30:00,07:58:00,08:00:00,0.0,2.0
1447832279,09:00:00,10:55:00,11:00:00,0.0,5.0
1454078777,07:30:00,07:55:00,08:00:00,0.0,5.0
1437647479,11:30:00,12:55:00,13:00:00,0.0,5.0
1452146070,07:30:00,07:55:00,08:00:00,0.0,5.0
1451597010,07:30:00,07:55:00,08:00:00,0.0,5.0
1452145660,08:30:00,08:55:00,09:00:00,0.0,5.0
1452145702,07:30:00,07:55:00,08:00:00,0.0,5.0
1454090364,08:30:00,08:54:00,09:00:00,0.0,6.0


### violation after `to_hours_in_effect`

In [536]:
%%sql

With diff_time AS (
SELECT
        summons_number,
        from_hours_in_effect,
        violation_time,  
        to_hours_in_effect,
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('hour' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '05:00:00'::time                  
                    THEN  24- EXTRACT('hour' FROM to_hours_in_effect) + EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                    THEN (- EXTRACT('hour' FROM violation_time ) + EXTRACT('hour' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('hour' FROM to_hours_in_effect) - EXTRACT('hour' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (24- EXTRACT('hour' FROM violation_time )+EXTRACT('hour' FROM to_hours_in_effect) )
            ELSE EXTRACT('hour' FROM to_hours_in_effect -  violation_time)     
            END AS hours,
        
        CASE 
            WHEN to_hours_in_effect > from_hours_in_effect
                AND to_hours_in_effect >= violation_time AND  violation_time >= from_hours_in_effect   
                 THEN  EXTRACT('minute' FROM to_hours_in_effect -  violation_time)               
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                 AND violation_time < '06:00:00'::time                  
                THEN  EXTRACT('minute' FROM to_hours_in_effect) + EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect > from_hours_in_effect 
                AND violation_time NOT BETWEEN from_hours_in_effect AND to_hours_in_effect
                THEN  EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                    AND (violation_time > to_hours_in_effect AND violation_time < from_hours_in_effect)
                   THEN (- EXTRACT('minute' FROM violation_time ) + EXTRACT('minute' FROM to_hours_in_effect) )
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND violation_time < to_hours_in_effect
                    THEN   EXTRACT('minute' FROM to_hours_in_effect) - EXTRACT('minute' FROM  violation_time)
            WHEN to_hours_in_effect < from_hours_in_effect
                     AND (violation_time > to_hours_in_effect)
                    THEN (-EXTRACT('minute' FROM violation_time )+EXTRACT('minute' FROM to_hours_in_effect) )
            ELSE EXTRACT('minute' FROM to_hours_in_effect -  violation_time)
            END AS minutes          
    FROM 
        (SELECT
            summons_number,
                TO_TIMESTAMP(from_hours_in_effect, 'HH12MIPM')::time as from_hours_in_effect,
                TO_TIMESTAMP(to_hours_in_effect, 'HH12MIPM')::time as to_hours_in_effect,
                TO_TIMESTAMP(violation_time, 'HH12MIPM')::time as violation_time
          FROM
            parking_violation
          WHERE NOT (from_hours_in_effect = '1200AM' AND to_hours_in_effect = '1159PM')
        ) sub 
)

SELECT summons_number,
        from_hours_in_effect,
        violation_time,  
        to_hours_in_effect, 
        hours AS violation_after_to_hours, 
        minutes
FROM diff_time
WHERE
hours *60 + minutes <0 

 * postgresql://postgres:***@localhost:5432/NYC_Open_Data
16 rows affected.


summons_number,from_hours_in_effect,violation_time,to_hours_in_effect,violation_after_to_hours,minutes
4011361938,,20:34:00,10:30:00,-10.0,-4.0
4011361975,,20:48:00,10:30:00,-10.0,-18.0
4011362815,,10:52:00,10:30:00,0.0,-22.0
4011362827,,11:05:00,10:30:00,0.0,-35.0
4011362839,,11:08:00,10:30:00,0.0,-38.0
4011363534,,14:24:00,10:30:00,-3.0,-54.0
4011363601,,14:33:00,10:30:00,-4.0,-3.0
4011363613,,14:34:00,10:30:00,-4.0,-4.0
4011363625,,14:35:00,10:30:00,-4.0,-5.0
4011363583,,14:29:00,10:30:00,-3.0,-59.0
