#### For running the sql queries on jupyter notebook we are gonna use the [`ipython-sql`](https://github.com/catherinedevlin/ipython-sql) module.

``` python
pip install ipython-sql
pip install psycopg2-binary
```
##### Load the extension with the command `%load_ext`


In [1]:
# pip install ipython-sql
%load_ext sql

#### For creating a connection we are using the following syntax for postgres.
``` postgresql://user:password@host:port/db_name```

In [2]:
%sql postgresql://root:root@localhost:5432/ny_taxi

%sql postgresql://

For this session we are going to use to tables of the database:

1. [`yellow_taxi_trips`](https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.csv) : This table contains the list of information about the trips done by the yellow taxi in nyc.
2. `taxi_zones` : This table contains the information about the zones details where the tips has been made.

** `%sql` will consider only the single line as a query whereas `%%sql` will take all the cell value as sql query.**
  

In [4]:
# Starting with the simple sql query
%sql SELECT * FROM yellow_taxi_trips LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
1,2021-01-01 00:30:10,2021-01-01 00:36:12,1,2.1,1,N,142,43,2,8.0,3.0,0.5,0.0,0.0,0.3,11.8,2.5
1,2021-01-01 00:51:20,2021-01-01 00:52:19,1,0.2,1,N,238,151,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0
1,2021-01-01 00:43:30,2021-01-01 01:11:06,1,14.7,1,N,132,165,1,42.0,0.5,0.5,8.65,0.0,0.3,51.95,0.0
1,2021-01-01 00:15:48,2021-01-01 00:31:01,0,10.6,1,N,138,132,1,29.0,0.5,0.5,6.05,0.0,0.3,36.35,0.0
2,2021-01-01 00:31:49,2021-01-01 00:48:21,1,4.94,1,N,68,33,1,16.5,0.5,0.5,4.06,0.0,0.3,24.36,2.5


In [38]:
%sql SELECT * FROM taxi_zones LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


LocationID,Borough,Zone,service_zone
1,EWR,Newark Airport,EWR
2,Queens,Jamaica Bay,Boro Zone
3,Bronx,Allerton/Pelham Gardens,Boro Zone
4,Manhattan,Alphabet City,Yellow Zone
5,Staten Island,Arden Heights,Boro Zone


### Exploring the database tables and data.

In [37]:
%%sql
SELECT *
FROM 
    yellow_taxi_trips yt JOIN taxi_zones tz
    ON yt."PULocationID" = tz."LocationID"
WHERE
    yt."PULocationID" = tz."LocationID"
LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,LocationID,Borough,Zone,service_zone
1,2021-01-01 00:30:10,2021-01-01 00:36:12,1,2.1,1,N,142,43,2,8.0,3.0,0.5,0.0,0.0,0.3,11.8,2.5,142,Manhattan,Lincoln Square East,Yellow Zone
1,2021-01-01 00:51:20,2021-01-01 00:52:19,1,0.2,1,N,238,151,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0,238,Manhattan,Upper West Side North,Yellow Zone
1,2021-01-01 00:43:30,2021-01-01 01:11:06,1,14.7,1,N,132,165,1,42.0,0.5,0.5,8.65,0.0,0.3,51.95,0.0,132,Queens,JFK Airport,Airports
1,2021-01-01 00:15:48,2021-01-01 00:31:01,0,10.6,1,N,138,132,1,29.0,0.5,0.5,6.05,0.0,0.3,36.35,0.0,138,Queens,LaGuardia Airport,Airports
2,2021-01-01 00:31:49,2021-01-01 00:48:21,1,4.94,1,N,68,33,1,16.5,0.5,0.5,4.06,0.0,0.3,24.36,2.5,68,Manhattan,East Chelsea,Yellow Zone


#### Selects all columns from the yellow_taxi_trips table. Select only the first 5 rows.

In [45]:
%%sql
SELECT
    *
FROM
    yellow_taxi_trips t,
    taxi_zones zpu,
    taxi_zones zdo
WHERE
    t."PULocationID" = zpu."LocationID" AND
    t."DOLocationID" = zdo."LocationID"
LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,LocationID,Borough,Zone,service_zone,LocationID_1,Borough_1,Zone_1,service_zone_1
1,2021-01-01 00:30:10,2021-01-01 00:36:12,1,2.1,1,N,142,43,2,8.0,3.0,0.5,0.0,0.0,0.3,11.8,2.5,142,Manhattan,Lincoln Square East,Yellow Zone,43,Manhattan,Central Park,Yellow Zone
1,2021-01-01 00:51:20,2021-01-01 00:52:19,1,0.2,1,N,238,151,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0,238,Manhattan,Upper West Side North,Yellow Zone,151,Manhattan,Manhattan Valley,Yellow Zone
1,2021-01-01 00:43:30,2021-01-01 01:11:06,1,14.7,1,N,132,165,1,42.0,0.5,0.5,8.65,0.0,0.3,51.95,0.0,132,Queens,JFK Airport,Airports,165,Brooklyn,Midwood,Boro Zone
1,2021-01-01 00:15:48,2021-01-01 00:31:01,0,10.6,1,N,138,132,1,29.0,0.5,0.5,6.05,0.0,0.3,36.35,0.0,138,Queens,LaGuardia Airport,Airports,132,Queens,JFK Airport,Airports
2,2021-01-01 00:31:49,2021-01-01 00:48:21,1,4.94,1,N,68,33,1,16.5,0.5,0.5,4.06,0.0,0.3,24.36,2.5,68,Manhattan,East Chelsea,Yellow Zone,33,Brooklyn,Brooklyn Heights,Boro Zone


* We give aliases to the `yellow_taxi_trips` and `taxi_zones' tables for easier access.
* We replace the IDs inside `PULocationID` and `DOLocationID` with the actual zone IDs for pick ups and drop offs.
* We use double quotes (`""`) for the column names because in Postgres we need to use them if the column names contains capital letters.

In [46]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
    CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
FROM
    yellow_taxi_trips t,
    taxi_zones zpu,
    taxi_zones zdo
WHERE
    t."PULocationID" = zpu."LocationID" AND
    t."DOLocationID" = zdo."LocationID"
LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,pickup_loc,dropoff_loc
2021-01-01 00:30:10,2021-01-01 00:36:12,11.8,Manhattan/Lincoln Square East,Manhattan/Central Park
2021-01-01 00:51:20,2021-01-01 00:52:19,4.3,Manhattan/Upper West Side North,Manhattan/Manhattan Valley
2021-01-01 00:43:30,2021-01-01 01:11:06,51.95,Queens/JFK Airport,Brooklyn/Midwood
2021-01-01 00:15:48,2021-01-01 00:31:01,36.35,Queens/LaGuardia Airport,Queens/JFK Airport
2021-01-01 00:31:49,2021-01-01 00:48:21,24.36,Manhattan/East Chelsea,Brooklyn/Brooklyn Heights


* Same as previous but instead of the complete rows we only display specific columns.
* We make use of joins (implicit joins in this case) to display combined info as a single column.
    * The new "virtual" column `pickup_loc` contains the values of both `Borough` and `Zone` columns of the zones table, separated by a slash (`/`).
    * Same for `dropoff_loc`.
* More specifically this is an inner join, because we only select the rows that overlap between the 2 tables.
* Learn more about SQL joins [here](https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/) and [here](https://www.wikiwand.com/en/Join_(SQL)).

In [48]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
    CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
FROM
    yellow_taxi_trips t JOIN taxi_zones zpu
        ON t."PULocationID" = zpu."LocationID"
    JOIN taxi_zones zdo
        ON t."DOLocationID" = zdo."LocationID"
LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,pickup_loc,dropoff_loc
2021-01-01 00:30:10,2021-01-01 00:36:12,11.8,Manhattan/Lincoln Square East,Manhattan/Central Park
2021-01-01 00:51:20,2021-01-01 00:52:19,4.3,Manhattan/Upper West Side North,Manhattan/Manhattan Valley
2021-01-01 00:43:30,2021-01-01 01:11:06,51.95,Queens/JFK Airport,Brooklyn/Midwood
2021-01-01 00:15:48,2021-01-01 00:31:01,36.35,Queens/LaGuardia Airport,Queens/JFK Airport
2021-01-01 00:31:49,2021-01-01 00:48:21,24.36,Manhattan/East Chelsea,Brooklyn/Brooklyn Heights


* Exactly the same statement as before but rewritten using explicit `JOIN` keywords.
    * Explicit inner joins are preferred over implicit inner joins.
* The `JOIN` keyword is used after the `FROM` statement rather than the `WHERE` statement. The `WHERE` statement is actually unneeded.
    ```sql
    SELECT whatever_columns FROM table_1 JOIN table_2_with_a_matching_column ON column_from_1=column_from_2
    ```
* You can also use the keyword `INNER JOIN` for clarity.
* Learn more about SQL joins [here](https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/) and [here](https://www.wikiwand.com/en/Join_(SQL)).

##### Checking whether `PULocationID` column contains null values.

In [50]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    "PULocationID",
    "DOLocationID"
FROM
    yellow_taxi_trips
WHERE
    "PULocationID" is NULL
LIMIT 100;

 * postgresql://root:***@localhost:5432/ny_taxi
0 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,PULocationID,DOLocationID


##### Checking whether `DOLocationID` column contains null values.

In [51]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    "PULocationID",
    "DOLocationID"
FROM
    yellow_taxi_trips
WHERE
    "DOLocationID" is NULL
LIMIT 100;

 * postgresql://root:***@localhost:5432/ny_taxi
0 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,PULocationID,DOLocationID


* Selects rows fromn the `yellow_taxi_trips` table whose drop off location ID does not appear in the `taxi_zones` table.
* If you did not modify any rows in the original datasets, the query would return an empty list.

In [52]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    "PULocationID",
    "DOLocationID"
FROM
    yellow_taxi_trips
WHERE
    "DOLocationID" NOT IN (
        SELECT "LocationID" FROM taxi_zones
    )
LIMIT 100;

 * postgresql://root:***@localhost:5432/ny_taxi
0 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,PULocationID,DOLocationID


In [5]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    "PULocationID",
    "DOLocationID"
FROM
    yellow_taxi_trips
WHERE
    "PULocationID" NOT IN (
        SELECT "LocationID" FROM taxi_zones
    )
LIMIT 100;

 * postgresql://root:***@localhost:5432/ny_taxi
0 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,PULocationID,DOLocationID


* Delete records from the `taxi_zones` table where `LocationID` is 142.

In [6]:
%%sql
DELETE FROM taxi_zones WHERE "LocationID" = 142;

 * postgresql://root:***@localhost:5432/ny_taxi
1 rows affected.


[]

In [11]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    "PULocationID",
    "DOLocationID"
FROM
    yellow_taxi_trips
WHERE
    "PULocationID" NOT IN (
        SELECT "LocationID" FROM taxi_zones
    )
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,PULocationID,DOLocationID
2021-01-01 00:30:10,2021-01-01 00:36:12,11.8,142,43
2021-01-01 00:31:06,2021-01-01 00:38:52,14.16,142,50
2021-01-01 00:33:38,2021-01-01 00:38:37,12.09,142,239
2021-01-01 00:57:26,2021-01-01 01:01:27,8.8,142,48
2021-01-01 00:25:23,2021-01-01 00:31:57,12.96,142,238
2021-01-01 00:18:14,2021-01-01 00:27:29,17.8,142,229
2021-01-01 00:02:00,2021-01-01 00:13:59,15.3,142,263
2021-01-01 00:32:01,2021-01-01 00:44:55,20.8,142,116
2021-01-01 00:32:58,2021-01-01 00:40:59,15.8,142,75
2021-01-01 00:15:41,2021-01-01 00:23:06,14.75,142,236


#### The below sql statement will only show the records if both the `PULocationID` & `DOLocationID` matches with the `LocationID`.

In [12]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
    CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
FROM
    yellow_taxi_trips yt JOIN taxi_zones zpu
        ON yt."PULocationID" = zpu."LocationID"
    LEFT JOIN taxi_zones zdo
        ON yt."DOLocationID" = zdo."LocationID"
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,pickup_loc,dropoff_loc
2021-01-01 00:51:20,2021-01-01 00:52:19,4.3,Manhattan/Upper West Side North,Manhattan/Manhattan Valley
2021-01-01 00:43:30,2021-01-01 01:11:06,51.95,Queens/JFK Airport,Brooklyn/Midwood
2021-01-01 00:15:48,2021-01-01 00:31:01,36.35,Queens/LaGuardia Airport,Queens/JFK Airport
2021-01-01 00:31:49,2021-01-01 00:48:21,24.36,Manhattan/East Chelsea,Brooklyn/Brooklyn Heights
2021-01-01 00:16:29,2021-01-01 00:24:30,14.15,Manhattan/Stuy Town/Peter Cooper Village,Manhattan/East Chelsea
2021-01-01 00:00:28,2021-01-01 00:17:28,17.3,Queens/Forest Hills,Queens/Maspeth
2021-01-01 00:12:29,2021-01-01 00:30:34,21.8,Manhattan/Flatiron,Brooklyn/Carroll Gardens
2021-01-01 00:39:16,2021-01-01 01:00:13,28.8,Brooklyn/Fort Greene,Queens/Jackson Heights
2021-01-01 00:26:12,2021-01-01 00:39:46,18.95,Manhattan/Yorkville West,/
2021-01-01 00:15:52,2021-01-01 00:38:07,24.3,Manhattan/Midtown South,Brooklyn/Williamsburg (North Side)


In [14]:
%%sql
SELECT COUNT(1) FROM (
    SELECT
        tpep_pickup_datetime,
        tpep_dropoff_datetime,
        total_amount,
        CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
        CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
    FROM
        yellow_taxi_trips yt JOIN taxi_zones zpu
            ON yt."PULocationID" = zpu."LocationID"
        LEFT JOIN taxi_zones zdo
            ON yt."DOLocationID" = zdo."LocationID"
    ) AS subtable;

 * postgresql://root:***@localhost:5432/ny_taxi
1 rows affected.


count
1327650


* ***Left joins*** shows all rows from the "left" part of the statement but only the rows from the "right" part that overlap with the "left" part, thus the name.
* This join is useful if we deleted one of the `LocationID` rows like before. The inner join would omit some rows from the `trips` table, but this query will show all rows. However, since one ID is missing, the "virtual" columns we defined to transform location ID's to actual names will appear with empty strings if the query cannot find the location ID.

In [15]:
%%sql
SELECT
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    total_amount,
    CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
    CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
FROM
    yellow_taxi_trips yt LEFT JOIN taxi_zones zpu
        ON yt."PULocationID" = zpu."LocationID"
    LEFT JOIN taxi_zones zdo
        ON yt."DOLocationID" = zdo."LocationID"
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,total_amount,pickup_loc,dropoff_loc
2021-01-01 00:30:10,2021-01-01 00:36:12,11.8,/,Manhattan/Central Park
2021-01-01 00:51:20,2021-01-01 00:52:19,4.3,Manhattan/Upper West Side North,Manhattan/Manhattan Valley
2021-01-01 00:43:30,2021-01-01 01:11:06,51.95,Queens/JFK Airport,Brooklyn/Midwood
2021-01-01 00:15:48,2021-01-01 00:31:01,36.35,Queens/LaGuardia Airport,Queens/JFK Airport
2021-01-01 00:31:49,2021-01-01 00:48:21,24.36,Manhattan/East Chelsea,Brooklyn/Brooklyn Heights
2021-01-01 00:16:29,2021-01-01 00:24:30,14.15,Manhattan/Stuy Town/Peter Cooper Village,Manhattan/East Chelsea
2021-01-01 00:00:28,2021-01-01 00:17:28,17.3,Queens/Forest Hills,Queens/Maspeth
2021-01-01 00:12:29,2021-01-01 00:30:34,21.8,Manhattan/Flatiron,Brooklyn/Carroll Gardens
2021-01-01 00:39:16,2021-01-01 01:00:13,28.8,Brooklyn/Fort Greene,Queens/Jackson Heights
2021-01-01 00:26:12,2021-01-01 00:39:46,18.95,Manhattan/Yorkville West,/


In [16]:
%%sql
SELECT COUNT(1) FROM (
    SELECT
        tpep_pickup_datetime,
        tpep_dropoff_datetime,
        total_amount,
        CONCAT(zpu."Borough", '/', zpu."Zone") AS "pickup_loc",
        CONCAT(zdo."Borough", '/', zdo."Zone") AS "dropoff_loc"
    FROM
        yellow_taxi_trips yt LEFT JOIN taxi_zones zpu
            ON yt."PULocationID" = zpu."LocationID"
        LEFT JOIN taxi_zones zdo
            ON yt."DOLocationID" = zdo."LocationID"
    ) AS subtable;

 * postgresql://root:***@localhost:5432/ny_taxi
1 rows affected.


count
1369765


### Comparing the count result by using simple `JOIN` and `LEFT JOIN` we can see the number of rows returned on left join is more.
Thus returning all the rows from the `yellow_taxi_zones` table.

* `DATE_TRUNC` is a function that trunctates a timestamp. When using `DAY` as a parameter, it removes any smaller values (hours, minutes, seconds) and displays them as `00:00:00` instead.
* `CAST` function will display the datas as simple dates removing the item `00:00:00`

In [18]:
%%sql
SELECT 
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    DATE_TRUNC('DAY', tpep_dropoff_datetime),
    total_amount
FROM
    yellow_taxi_trips
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,date_trunc,total_amount
2021-01-01 00:30:10,2021-01-01 00:36:12,2021-01-01 00:00:00,11.8
2021-01-01 00:51:20,2021-01-01 00:52:19,2021-01-01 00:00:00,4.3
2021-01-01 00:43:30,2021-01-01 01:11:06,2021-01-01 00:00:00,51.95
2021-01-01 00:15:48,2021-01-01 00:31:01,2021-01-01 00:00:00,36.35
2021-01-01 00:31:49,2021-01-01 00:48:21,2021-01-01 00:00:00,24.36
2021-01-01 00:16:29,2021-01-01 00:24:30,2021-01-01 00:00:00,14.15
2021-01-01 00:00:28,2021-01-01 00:17:28,2021-01-01 00:00:00,17.3
2021-01-01 00:12:29,2021-01-01 00:30:34,2021-01-01 00:00:00,21.8
2021-01-01 00:39:16,2021-01-01 01:00:13,2021-01-01 00:00:00,28.8
2021-01-01 00:26:12,2021-01-01 00:39:46,2021-01-01 00:00:00,18.95


In [19]:
%%sql
SELECT 
    tpep_pickup_datetime,
    tpep_dropoff_datetime,
    CAST(tpep_dropoff_datetime AS DATE),
    total_amount
FROM
    yellow_taxi_trips
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


tpep_pickup_datetime,tpep_dropoff_datetime,tpep_dropoff_datetime_1,total_amount
2021-01-01 00:30:10,2021-01-01 00:36:12,2021-01-01,11.8
2021-01-01 00:51:20,2021-01-01 00:52:19,2021-01-01,4.3
2021-01-01 00:43:30,2021-01-01 01:11:06,2021-01-01,51.95
2021-01-01 00:15:48,2021-01-01 00:31:01,2021-01-01,36.35
2021-01-01 00:31:49,2021-01-01 00:48:21,2021-01-01,24.36
2021-01-01 00:16:29,2021-01-01 00:24:30,2021-01-01,14.15
2021-01-01 00:00:28,2021-01-01 00:17:28,2021-01-01,17.3
2021-01-01 00:12:29,2021-01-01 00:30:34,2021-01-01,21.8
2021-01-01 00:39:16,2021-01-01 01:00:13,2021-01-01,28.8
2021-01-01 00:26:12,2021-01-01 00:39:46,2021-01-01,18.95


In [9]:
%%sql
SELECT 
    CAST(tpep_dropoff_datetime AS DATE) AS "day",
    COUNT(1)
FROM
    yellow_taxi_trips
GROUP BY
    CAST(tpep_dropoff_datetime AS DATE)
ORDER BY day ASC;

 * postgresql://root:***@localhost:5432/ny_taxi
37 rows affected.


day,count
2008-12-31,1
2009-01-01,3
2020-10-13,1
2020-12-31,9
2021-01-01,24672
2021-01-02,34230
2021-01-03,26374
2021-01-04,44588
2021-01-05,46882
2021-01-06,49549


### Storing the output as dataframe for futher analyzing in pandas.

In [7]:
import pandas as pd
from sqlalchemy import create_engine

# Replace with your database connection details
engine = create_engine('postgresql://root:root@localhost:5432/ny_taxi')

# Execute the SQL query and store the results as a DataFrame
df = pd.read_sql_query("""
    SELECT 
        CAST(tpep_dropoff_datetime AS DATE) AS "day",
        COUNT(1)
    FROM
        yellow_taxi_trips
    GROUP BY
        CAST(tpep_dropoff_datetime AS DATE)
""", engine)

# Print the DataFrame to verify the results
df.head()

Unnamed: 0,day,count
0,2021-01-21,53246
1,2009-01-01,3
2,2021-01-09,39935
3,2021-01-11,46875
4,2021-01-29,54601


In [8]:
%%sql
SELECT 
    CAST(tpep_dropoff_datetime AS DATE) AS "day",
    COUNT(1) as count
FROM
    yellow_taxi_trips
GROUP BY
    CAST(tpep_dropoff_datetime AS DATE)
ORDER BY count DESC
LIMIT 5;

 * postgresql://root:***@localhost:5432/ny_taxi
5 rows affected.


day,count
2021-01-28,56385
2021-01-29,54601
2021-01-22,54225
2021-01-21,53246
2021-01-14,53019


In [15]:
%%sql
SELECT 
    CAST(tpep_dropoff_datetime AS DATE) AS "day",
    COUNT(1) as count,
    MAX(total_amount) AS max_amount,
    MAX(passenger_count) AS max_no_of_passengers
FROM
    yellow_taxi_trips
GROUP BY
    CAST(tpep_dropoff_datetime AS DATE)
ORDER BY 3 DESC
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


day,count,max_amount,max_no_of_passengers
2021-01-04,44588,7661.28,6
2021-01-20,49447,2292.4,6
2021-01-12,50115,1155.65,7
2021-01-03,26374,1108.2,6
2021-01-10,29873,900.35,6
2021-01-19,51120,894.2,7
2021-01-06,49549,872.54,6
2021-01-07,50245,872.05,6
2021-01-27,52676,831.0,6
2021-01-08,50467,815.05,6


In [21]:
%%sql
SELECT 
    CAST(tpep_dropoff_datetime AS DATE) AS "day",
    "DOLocationID",
    COUNT(1) as count,
    MAX(total_amount) AS max_amount,
    MAX(passenger_count) AS max_no_of_passengers
FROM
    yellow_taxi_trips
GROUP BY
    1, 2
ORDER BY 
    1 ASC, 2 ASC
LIMIT 10;

 * postgresql://root:***@localhost:5432/ny_taxi
10 rows affected.


day,DOLocationID,count,max_amount,max_no_of_passengers
2008-12-31,193,1,0.0,1
2009-01-01,10,1,14.3,1
2009-01-01,89,1,10.3,1
2009-01-01,238,1,11.8,6
2020-10-13,234,1,13.3,1
2020-12-31,68,1,14.3,1
2020-12-31,74,2,12.8,1
2020-12-31,79,1,14.12,1
2020-12-31,137,1,24.8,1
2020-12-31,213,1,53.3,1
