# Week 1 - Iterating through data using Python

## A real data challenge – a massive dataset

You want to explore `NYC EMS data since 2011` to see if COVID was already spreading through NYC well before the stay-at-home order in Mid-March 2020. 

But you are saddled with a <a href="https://www.dropbox.com/scl/fi/qni8e1g8pv7qc21y2ijp3/EMS_Incident_Dispatch_Data_20240407.csv.zip?rlkey=ybnhybbl3ljas5xa8f1e37msx&st=xqrn50vd&dl=0">massive file</a> that is `6+GB` that has `more than 26 million` rows of data. 

What is your strategy? 











Our strategy is to `iterate` or `loop` through the data in smaller chunks and to analyze them. We will analyze the entire dataset and to get complete, reconstituted results that represent an analysis of the entire 6+GB of data.   

We need to learn a couple of fundamental Python techniques that will help us extend Pandas' abilities.




### 1. ```for loops```... a data journalist's favorite Python expression</center>

We can use a `for loop` to **iterate** (to do the same series of steps in a process over and over again), including:
* running some calculation on each value stored in a list;
* opening and reading a list of files;
* literally an endless series of important tasks.

In [4]:
## name dog lucy
my_dog = "Lucy"
my_dog

'Lucy'

In [5]:
## upper case the previous variable 
## you can target an individual item
my_dog.upper()

'LUCY'

In [6]:
## run this list
animals = ["dogs", "cats", "birds", "elephants"]


In [7]:
##call our list
animals

['dogs', 'cats', 'birds', 'elephants']

In [8]:
## upper case and print each animal in our list
## this will break
# animals.upper()

In [9]:
animals

['dogs', 'cats', 'birds', 'elephants']

In [10]:
## recall that we can slice a list

animals[0]
animals[1:3]

['cats', 'birds']

In [11]:
## we can target an individual item that has has been sliced from a list
animals[1].upper()

'CATS'

**We can't do one at a time, but we can iterate through all of them using a `for loop`**

In [13]:
## use a for loop to upper case each animal and print it
for animal in animals:
    print(animal.upper())

DOGS
CATS
BIRDS
ELEPHANTS


### What's happening in a `for loop`:

<img src="https://sandeepmj.github.io/image-host/forloop3.png">


<img src="https://sandeepmj.github.io/image-host/forloop4.png">


<img src="https://sandeepmj.github.io/image-host/forloop5.png">


<img src="https://sandeepmj.github.io/image-host/forloop6.png">



<img src="https://sandeepmj.github.io/image-host/forloop7.png">


<img src="https://sandeepmj.github.io/image-host/forloop8.png">


<img src="https://sandeepmj.github.io/image-host/forloop9.png">


<img src="https://sandeepmj.github.io/image-host/forloop10.png">

# To recap:
<img src="https://sandeepmj.github.io/image-host/forloop6.png">


In [23]:
## re run a for loop to upper case each animal and print it

for x in animals:
    print(x.title())

Dogs
Cats
Birds
Elephants


In [24]:
## call little_creature


In [25]:
## did our list change? Call the fav_animals list
animals

['dogs', 'cats', 'birds', 'elephants']

### 2.  `append()`

The `append()` lets us append values to a new list. Even if the list does not exist, we can declare it and then append to it.

```python
    new_list = []
    new_list.append(some_value)
```


In [27]:
upper_case_animals = []
upper_case_animals

[]

In [28]:
## We save our iterated data by adding to an empty list

for animal in animals:
    upper_case_animals.append(animal.upper())
    # print(upper_case_animals)


upper_case_animals


['DOGS', 'CATS', 'BIRDS', 'ELEPHANTS']

In [29]:
## call upper animals
upper_case_animals

['DOGS', 'CATS', 'BIRDS', 'ELEPHANTS']

## Let's take **For Loops** for test drive:

### Combine different data points together 

#### You scrape some URLs and place them in a list called myURLS (provided below):

In [31]:
## run this cell to activate the list
myURLS = [
    'great-unique-data-1.html',
    'great-unique-data-2.html',
    'great-unique-data-3.html',
    'great-unique-data-4.html',
    'great-unique-data-5.html',
    'great-unique-data-6.html',
    'great-unique-data-7.html',
    'great-unique-data-8.html',
    'great-unique-data-9.html',
    'great-unique-data-10.html',
    'great-unique-data-11.html',
    'great-unique-data-12.html',
    'great-unique-data-13.html',
    'great-unique-data-14.html',
    'great-unique-data-15.html'
]

myURLS

['great-unique-data-1.html',
 'great-unique-data-2.html',
 'great-unique-data-3.html',
 'great-unique-data-4.html',
 'great-unique-data-5.html',
 'great-unique-data-6.html',
 'great-unique-data-7.html',
 'great-unique-data-8.html',
 'great-unique-data-9.html',
 'great-unique-data-10.html',
 'great-unique-data-11.html',
 'great-unique-data-12.html',
 'great-unique-data-13.html',
 'great-unique-data-14.html',
 'great-unique-data-15.html']

In [32]:
myURLS

['great-unique-data-1.html',
 'great-unique-data-2.html',
 'great-unique-data-3.html',
 'great-unique-data-4.html',
 'great-unique-data-5.html',
 'great-unique-data-6.html',
 'great-unique-data-7.html',
 'great-unique-data-8.html',
 'great-unique-data-9.html',
 'great-unique-data-10.html',
 'great-unique-data-11.html',
 'great-unique-data-12.html',
 'great-unique-data-13.html',
 'great-unique-data-14.html',
 'great-unique-data-15.html']

### * You realize that these URLs are missing the base of "http://www.importantsite.com/"
### * Use a ```for loop``` to join the base URL to every partial URL in your list.
### * Print each FULL URL
It should look like: ```"http://www.importantsite.com/great-unique-data-14.html``` but with unique numbers

In [34]:
## for loop and print
base_url = "http://www.importantsite.com/"
for myURL in myURLS:
    print(f"{base_url}{myURL}")
    


http://www.importantsite.com/great-unique-data-1.html
http://www.importantsite.com/great-unique-data-2.html
http://www.importantsite.com/great-unique-data-3.html
http://www.importantsite.com/great-unique-data-4.html
http://www.importantsite.com/great-unique-data-5.html
http://www.importantsite.com/great-unique-data-6.html
http://www.importantsite.com/great-unique-data-7.html
http://www.importantsite.com/great-unique-data-8.html
http://www.importantsite.com/great-unique-data-9.html
http://www.importantsite.com/great-unique-data-10.html
http://www.importantsite.com/great-unique-data-11.html
http://www.importantsite.com/great-unique-data-12.html
http://www.importantsite.com/great-unique-data-13.html
http://www.importantsite.com/great-unique-data-14.html
http://www.importantsite.com/great-unique-data-15.html


### Update myURLS and store full URLS in a new list

#### Instead of just printing the joined URLs, create a new list called ```full_URLS``` that holds the full URLs.

In [36]:
## store the updated values
base_url = "http://www.importantsite.com/"
full_url = []
for myURL in myURLS:
    full_url.append(base_url+myURL)


In [37]:
## call the new list
full_url

['http://www.importantsite.com/great-unique-data-1.html',
 'http://www.importantsite.com/great-unique-data-2.html',
 'http://www.importantsite.com/great-unique-data-3.html',
 'http://www.importantsite.com/great-unique-data-4.html',
 'http://www.importantsite.com/great-unique-data-5.html',
 'http://www.importantsite.com/great-unique-data-6.html',
 'http://www.importantsite.com/great-unique-data-7.html',
 'http://www.importantsite.com/great-unique-data-8.html',
 'http://www.importantsite.com/great-unique-data-9.html',
 'http://www.importantsite.com/great-unique-data-10.html',
 'http://www.importantsite.com/great-unique-data-11.html',
 'http://www.importantsite.com/great-unique-data-12.html',
 'http://www.importantsite.com/great-unique-data-13.html',
 'http://www.importantsite.com/great-unique-data-14.html',
 'http://www.importantsite.com/great-unique-data-15.html']

### 3. Counting while iterating

Often we need to increment a number to track progress of an iteration, for loop. It's important to know where in the process we are – especially when dealing with thousands of files or links.

We can easily track our progress using Python's ```enumerate```.

```python
for i, item in enumerate(items, start=1):
    print(f"List item {i}: {item}")
```
In English: 

“Go through each item in the list, keep track of its position starting at 1, and print both the position and the item.”

In [39]:
# without tracking progress
for animal in animals:
    print(f"{animal}")

dogs
cats
birds
elephants


In [40]:
## track progress
for i, animal in enumerate(animals, start = 1):
    print(f"Animal No. {i} is {animal}")

Animal No. 1 is dogs
Animal No. 2 is cats
Animal No. 3 is birds
Animal No. 4 is elephants


In [41]:
animal_sentence = []
for i, animal in enumerate(animals, start = 1):
    animal_sentence.append(f"Animal No. {i} is {animal}")

In [42]:
animal_sentence

['Animal No. 1 is dogs',
 'Animal No. 2 is cats',
 'Animal No. 3 is birds',
 'Animal No. 4 is elephants']

## Back to our EMS data challenge

For those of you who don't have sufficient splace, I have created <a href="https://raw.githubusercontent.com/sandeepmj/datasets/main/ems-excerpt.csv">an excerpt</a> of the `6+GB` dataset that is `25MB` and holds 100,000 rows of data instead of millions of rows. Those using the excerpt, your strategy will break to take break 100,000-rows file and `chunk` it into 10K pieces. 

With the actual `6+GB` file, we'll break it into 500K chunks.

In [44]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [45]:
## import libraries
import pandas as pd

In [104]:
## checkout the top 10
df_head = pd.read_csv("EMS_Incident_Dispatch_Data_20240407.csv", nrows = 12)
df_head

Unnamed: 0,CAD_INCIDENT_ID,INCIDENT_DATETIME,INITIAL_CALL_TYPE,INITIAL_SEVERITY_LEVEL_CODE,FINAL_CALL_TYPE,FINAL_SEVERITY_LEVEL_CODE,FIRST_ASSIGNMENT_DATETIME,VALID_DISPATCH_RSPNS_TIME_INDC,DISPATCH_RESPONSE_SECONDS_QY,FIRST_ACTIVATION_DATETIME,...,ZIPCODE,POLICEPRECINCT,CITYCOUNCILDISTRICT,COMMUNITYDISTRICT,COMMUNITYSCHOOLDISTRICT,CONGRESSIONALDISTRICT,REOPEN_INDICATOR,SPECIAL_EVENT_INDICATOR,STANDBY_INDICATOR,TRANSFER_INDICATOR
0,110010790,01/01/2011 02:19:47 AM,UNC,2,UNC,2,01/01/2011 02:21:14 AM,Y,87,01/01/2011 02:21:31 AM,...,10030.0,32.0,9.0,110.0,5.0,13.0,N,N,N,N
1,110010791,01/01/2011 02:19:49 AM,EDP,7,EDP,7,,N,0,,...,10029.0,25.0,8.0,111.0,4.0,13.0,N,N,N,N
2,110010792,01/01/2011 02:19:52 AM,UNKNOW,3,UNKNOW,3,01/01/2011 02:25:05 AM,Y,313,01/01/2011 02:25:10 AM,...,10016.0,14.0,4.0,105.0,2.0,12.0,N,N,N,N
3,110010793,01/01/2011 02:19:56 AM,UNC,2,UNC,2,01/01/2011 02:20:31 AM,Y,35,01/01/2011 02:20:37 AM,...,11213.0,77.0,36.0,308.0,17.0,9.0,N,N,N,N
4,110010794,01/01/2011 02:20:05 AM,INJURY,5,INJURY,5,01/01/2011 03:29:04 AM,Y,4139,01/01/2011 03:29:04 AM,...,10022.0,18.0,4.0,105.0,2.0,12.0,N,N,N,N
5,110010795,01/01/2011 02:20:06 AM,UNKNOW,3,UNKNOW,3,01/01/2011 02:21:43 AM,Y,97,01/01/2011 02:21:54 AM,...,11208.0,75.0,42.0,305.0,19.0,8.0,N,N,N,N
6,110010796,01/01/2011 02:20:06 AM,UNKNOW,3,UNKNOW,3,01/01/2011 02:20:15 AM,Y,9,01/01/2011 02:20:25 AM,...,,,,,,,N,N,N,N
7,110010797,01/01/2011 02:20:17 AM,DRUG,4,DRUG,4,01/01/2011 02:40:16 AM,Y,1199,01/01/2011 02:40:27 AM,...,10455.0,41.0,17.0,202.0,8.0,15.0,N,N,N,N
8,110010798,01/01/2011 02:20:24 AM,RESPIR,4,RESPIR,4,01/01/2011 02:23:17 AM,Y,173,01/01/2011 02:23:35 AM,...,10465.0,45.0,13.0,210.0,8.0,14.0,N,N,N,N
9,110010799,01/01/2011 02:20:31 AM,UNC,2,UNC,2,01/01/2011 02:20:52 AM,Y,21,01/01/2011 02:20:59 AM,...,11209.0,68.0,43.0,310.0,20.0,11.0,N,N,N,N


         CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
9500000         90430620  02/12/2009 07:14:00 AM             OBLAB   
9500001         90430621  02/12/2009 07:15:29 AM             ABDPN   
9500002         90430622  02/12/2009 07:15:54 AM              SICK   
9500003         90430623  02/12/2009 07:16:00 AM               EDP   
9500004         90430624  02/12/2009 07:17:18 AM              SICK   
...                  ...                     ...               ...   
9999995         91873264  07/06/2009 07:40:31 PM            ARREST   
9999996         91873267  07/06/2009 07:41:43 PM             ABDPN   
9999997         91873268  07/06/2009 07:41:50 PM             SEIZR   
9999998         91873269  07/06/2009 07:42:18 PM            SICKFC   
9999999         91873270  07/06/2009 07:42:33 PM            STATEP   

         INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
9500000                            5           OBLAB   
9500001                            5           

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
10500000         93351865  12/01/2009 02:36:47 PM            ASTHMB   
10500001         93351866  12/01/2009 02:37:02 PM              SICK   
10500002         93351867  12/01/2009 02:37:06 PM             ABDPN   
10500003         93351868  12/01/2009 02:37:19 PM            PEDSTR   
10500004         93351869  12/01/2009 02:37:20 PM            INJMIN   
...                   ...                     ...               ...   
10999995        101211935  05/01/2010 12:55:52 PM            INJURY   
10999996        101211936  05/01/2010 12:55:54 PM            UNKNOW   
10999997        101211937  05/01/2010 12:56:08 PM               UNC   
10999998        101211938  05/01/2010 12:56:14 PM              SICK   
10999999        101211939  05/01/2010 12:56:27 PM               UNC   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
10500000                            2          ASTHMB   
10500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
11500000        102600050  09/17/2010 12:27:51 AM            INJURY   
11500001        102600051  09/17/2010 12:28:03 AM            INBLED   
11500002        102600052  09/17/2010 12:28:03 AM              SICK   
11500003        102600053  09/17/2010 12:28:23 AM              CARD   
11500004        102600054  09/17/2010 12:28:27 AM            INJURY   
...                   ...                     ...               ...   
11999995        170230422  01/23/2017 03:35:25 AM              DRUG   
11999996        170230423  01/23/2017 03:36:33 AM            SICPED   
11999997        170230424  01/23/2017 03:37:16 AM            DIFFBR   
11999998        170230425  01/23/2017 03:38:42 AM            UNKNOW   
11999999        170230426  01/23/2017 03:39:22 AM            UNKNOW   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
11500000                            5          INJURY   
11500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
12500000        171492882  05/29/2017 07:10:49 PM            MVAINJ   
12500001        171492883  05/29/2017 07:11:50 PM              CARD   
12500002        171492884  05/29/2017 07:11:51 PM              SICK   
12500003        171492885  05/29/2017 07:11:54 PM            ALTMEN   
12500004        171492886  05/29/2017 07:12:20 PM               UNC   
...                   ...                     ...               ...   
12999995        172710868  09/28/2017 08:06:59 AM            INBLED   
12999996        172710869  09/28/2017 08:06:59 AM            MVAINJ   
12999997        172710870  09/28/2017 08:07:07 AM            PEDSTR   
12999998        172710871  09/28/2017 08:07:16 AM             ANAPH   
12999999        172710872  09/28/2017 08:07:23 AM               UNC   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
12500000                            4          MVAINJ   
12500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
13500000        180281488  01/28/2018 10:27:10 AM              SICK   
13500001        180281489  01/28/2018 10:27:24 AM              SICK   
13500002        180281490  01/28/2018 10:27:25 AM            INJMAJ   
13500003        180281491  01/28/2018 10:27:47 AM               EDP   
13500004        180281492  01/28/2018 10:28:12 AM            UNKNOW   
...                   ...                     ...               ...   
13999995        181491375  05/29/2018 09:37:46 AM            GYNMAJ   
13999996        181491376  05/29/2018 09:37:48 AM              SICK   
13999997        181491377  05/29/2018 09:37:57 AM               UNC   
13999998        181491378  05/29/2018 09:38:00 AM              STAB   
13999999        181491379  05/29/2018 09:38:24 AM              CARD   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
13500000                            6            SICK   
13500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
14500000        182642615  09/21/2018 02:52:03 PM               EDP   
14500001        182642616  09/21/2018 02:52:29 PM            CARDBR   
14500002        182642618  09/21/2018 02:52:42 PM               EDP   
14500003        182642620  09/21/2018 02:53:05 PM            INJURY   
14500004        182642621  09/21/2018 02:53:10 PM            UNKNOW   
...                   ...                     ...               ...   
14999995        190230819  01/23/2019 07:45:01 AM            MVAINJ   
14999996        190230820  01/23/2019 07:45:35 AM            SICMIN   
14999997        190230821  01/23/2019 07:45:42 AM               UNC   
14999998        190230822  01/23/2019 07:46:00 AM             ABDPN   
14999999        190230823  01/23/2019 07:46:04 AM              SICK   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
14500000                            7             EDP   
14500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
15500000        191443492  05/24/2019 06:00:00 PM               UNC   
15500001        191443493  05/24/2019 06:00:05 PM               EDP   
15500002        191443494  05/24/2019 06:00:28 PM             OTHER   
15500003        191443495  05/24/2019 06:00:40 PM            STATEP   
15500004        191443496  05/24/2019 06:00:41 PM              DRUG   
...                   ...                     ...               ...   
15999995        192590562  09/16/2019 04:40:38 AM            INJURY   
15999996        192590563  09/16/2019 04:41:18 AM              CARD   
15999997        192590564  09/16/2019 04:41:24 AM              SICK   
15999998        192590565  09/16/2019 04:41:59 AM               EDP   
15999999        192590566  09/16/2019 04:42:18 AM            CARDBR   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
15500000                            2             UNC   
15500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
16500000        200130143  01/13/2020 12:54:21 AM              SICK   
16500001        200130146  01/13/2020 12:55:20 AM              SICK   
16500002        200130147  01/13/2020 12:55:55 AM              DRUG   
16500003        200130148  01/13/2020 12:55:56 AM            INJMAJ   
16500004        200130149  01/13/2020 12:56:05 AM            ASTHMB   
...                   ...                     ...               ...   
16999995        200860215  03/26/2020 01:01:54 AM              SICK   
16999996        200860218  03/26/2020 01:02:33 AM             SEIZR   
16999997        200860219  03/26/2020 01:02:36 AM             OBMAJ   
16999998        200860220  03/26/2020 01:02:45 AM            INJURY   
16999999        200860221  03/26/2020 01:02:53 AM            GYNMAJ   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
16500000                            6            SICK   
16500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
17500000         50031264  01/03/2005 10:24:58 AM            INJURY   
17500001         50031265  01/03/2005 10:26:02 AM              SICK   
17500002         50031266  01/03/2005 10:25:58 AM               CVA   
17500003         50031267  01/03/2005 10:25:57 AM            INHALE   
17500004         50031268  01/03/2005 10:26:32 AM            ASTHMA   
...                   ...                     ...               ...   
17999995         51652232  06/14/2005 03:48:03 PM            MVAINJ   
17999996         51652233  06/14/2005 03:49:47 PM             OTHER   
17999997         51652234  06/14/2005 03:49:47 PM              SICK   
17999998         51652235  06/14/2005 03:49:58 PM            PEDSTR   
17999999         51652236  06/14/2005 03:49:58 PM            INJURY   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
17500000                            5          ALTMEN   
17500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
18500000         53240349  11/20/2005 02:35:34 AM            DIFFBR   
18500001         53240350  11/20/2005 02:35:37 AM            ASTHMA   
18500002         53240351  11/20/2005 02:35:54 AM            ASTHMA   
18500003         53240352  11/20/2005 02:35:54 AM            INJMIN   
18500004         53240353  11/20/2005 02:36:10 AM              SICK   
...                   ...                     ...               ...   
18999995         61210373  05/01/2006 04:16:11 AM              SICK   
18999996         61210374  05/01/2006 04:16:48 AM              SICK   
18999997         61210375  05/01/2006 04:17:30 AM              DRUG   
18999998         61210376  05/01/2006 04:19:17 AM            ALTMEN   
18999999         61210377  05/01/2006 04:19:39 AM             OBLAB   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
18500000                            2          DIFFBR   
18500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
19500000         62741902  10/01/2006 03:03:18 PM              SICK   
19500001         62741904  10/01/2006 03:03:51 PM             ABDPN   
19500002         62741905  10/01/2006 03:04:51 PM            INJURY   
19500003         62741906  10/01/2006 03:05:08 PM            DIFFBR   
19500004         62741907  10/01/2006 03:05:23 PM             OTHER   
...                   ...                     ...               ...   
19999995         70700574  03/11/2007 04:30:59 AM               UNC   
19999996         70700578  03/11/2007 04:31:57 AM            INJURY   
19999997         70700579  03/11/2007 04:32:21 AM            INJURY   
19999998         70700580  03/11/2007 04:32:38 AM              SICK   
19999999         70700581  03/11/2007 04:32:43 AM              SICK   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
19500000                            6            SICK   
19500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
20500000         72210171  08/09/2007 01:15:29 AM            PEDSTR   
20500001         72210175  08/09/2007 01:17:04 AM              SICK   
20500002         72210176  08/09/2007 01:17:24 AM             ABDPN   
20500003         72210177  08/09/2007 01:18:00 AM             ABDPN   
20500004         72210178  08/09/2007 01:18:01 AM              SICK   
...                   ...                     ...               ...   
20999995        202732523  09/29/2020 04:03:55 PM            UNKNOW   
20999996        202732524  09/29/2020 04:04:41 PM              CARD   
20999997        202732525  09/29/2020 04:04:43 PM             ABDPN   
20999998        202732526  09/29/2020 04:04:57 PM            INJMAJ   
20999999        202732527  09/29/2020 04:05:05 PM            PEDSTR   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
20500000                            3          PEDSTR   
20500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
21500000        210450942  02/14/2021 07:42:00 AM            DIFFFC   
21500001        210450943  02/14/2021 07:42:00 AM             ABDPN   
21500002        210450944  02/14/2021 07:43:00 AM              SICK   
21500003        210450945  02/14/2021 07:44:00 AM            UNKNOW   
21500004        210450947  02/14/2021 07:45:00 AM              SICK   
...                   ...                     ...               ...   
21999995        211693703  06/18/2021 06:39:41 PM               EDP   
21999996        211693704  06/18/2021 06:39:53 PM            INJURY   
21999997        211693705  06/18/2021 06:39:56 PM            SICMFC   
21999998        211693707  06/18/2021 06:40:45 PM            INJURY   
21999999        211693708  06/18/2021 06:41:11 PM              SICK   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
21500000                            2          DIFFBR   
21500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
22500000        212870633  10/14/2021 05:12:59 AM              DRUG   
22500001        212870634  10/14/2021 05:13:11 AM              DRUG   
22500002        212870635  10/14/2021 05:13:39 AM            DIFFBR   
22500003        212870636  10/14/2021 05:13:48 AM            DIFFFC   
22500004        212870637  10/14/2021 05:14:06 AM            INJURY   
...                   ...                     ...               ...   
22999995        220431486  02/12/2022 10:19:23 AM              CARD   
22999996        220431487  02/12/2022 10:19:58 AM               EDP   
22999997        220431488  02/12/2022 10:20:08 AM             ABDPN   
22999998        220431489  02/12/2022 10:20:31 AM            INJMAJ   
22999999        220431490  02/12/2022 10:21:06 AM            UNKNOW   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
22500000                            7            DRUG   
22500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
23500000        221673230  06/16/2022 04:35:31 PM            UNKNOW   
23500001        221673231  06/16/2022 04:35:33 PM            RESPIR   
23500002        221673232  06/16/2022 04:35:42 PM            UNKNOW   
23500003        221673233  06/16/2022 04:35:49 PM               UNC   
23500004        221673235  06/16/2022 04:36:10 PM               EDP   
...                   ...                     ...               ...   
23999995        222794186  10/06/2022 05:37:29 PM              EDPC   
23999996        222794187  10/06/2022 05:37:39 PM            SICKFC   
23999997        222794188  10/06/2022 05:37:49 PM            ARREST   
23999998        222794190  10/06/2022 05:38:21 PM               UNC   
23999999        222794191  10/06/2022 05:38:32 PM            ABDPFC   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
23500000                            4          UNKNOW   
23500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
24500000        230281394  01/28/2023 09:21:50 AM            DIFFFC   
24500001        230281395  01/28/2023 09:22:34 AM              DRUG   
24500002        230281396  01/28/2023 09:22:57 AM             ABDPN   
24500003        230281398  01/28/2023 09:23:32 AM            SICKFC   
24500004        230281399  01/28/2023 09:23:42 AM            UNKNOW   
...                   ...                     ...               ...   
24999995        231462414  05/26/2023 01:45:14 PM            CARDBR   
24999996        231462417  05/26/2023 01:46:44 PM              STAB   
24999997        231462418  05/26/2023 01:47:00 PM            UNKNOW   
24999998        231462419  05/26/2023 01:47:00 PM            INJURY   
24999999        231462420  05/26/2023 01:47:24 PM            ABDPFC   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
24500000                            2          DIFFFC   
24500001                         

          CAD_INCIDENT_ID       INCIDENT_DATETIME INITIAL_CALL_TYPE  \
25500000        232583671  09/15/2023 06:16:22 PM              EDPC   
25500001        232583672  09/15/2023 06:16:22 PM              DRUG   
25500002        232583674  09/15/2023 06:17:15 PM             ABDPN   
25500003        232583675  09/15/2023 06:17:16 PM            INJURY   
25500004        232583677  09/15/2023 06:17:39 PM              SICK   
...                   ...                     ...               ...   
25984638        213414667  12/07/2021 11:56:43 PM            INJURY   
25984639        213414668  12/07/2021 11:57:21 PM            SICMIN   
25984640        213414669  12/07/2021 11:57:53 PM            SICKFC   
25984641        213414670  12/07/2021 11:59:15 PM            DIFFBR   
25984642        213414671  12/07/2021 11:59:38 PM            DIFFBR   

          INITIAL_SEVERITY_LEVEL_CODE FINAL_CALL_TYPE  \
25500000                            7            EDPC   
25500001                         

In [106]:
## create a covid related list
related_to_covid = [
    "ABDFC",   # ABDOMINAL PAIN-FEVER & COUGH
    "ABDPFC",  # ABDOMINAL PAIN-FEVER & COUGH
    "ABDPFT",  # ABDOMINAL PAIN FEVER/TRAVEL
    "ALTMFC",  # ALT MENTAL STATUS-FEVER&COUGH
    "ALTMFT",  # ALTERED MENTAL STATUS FEVER/TRAVEL
    "ANAPFC",  # ANAPHYLACTIC SHOCK-FEVER&COUGH
    "ANAPFT",  # ANAPHYLACTIC FEVER/TRAVEL
    "ARREFC",  # CARD OR RESP ARREST-FEVERCOUGH
    "ARREFT",  # CARDIAC ARREST PATIENT FEVER/TRAVEL
    "ASTHFC",  # ASTHMA ATTACK - FEVER&COUGH
    "ASTHFT",  # ASTHMA PATIENT FEVER/TRAVEL
    "ASTHMA",  # ASTHMA A
    "ASTHMB",  # ASTHMA ATTACK
    "ASTHMC",  # ASTHMA CRITICAL
    "ASTHMP",  # ASTHMA ATTACK-PEDS <15 YRS OLD
    "CARDFC",  # CARDIAC CONDITION-FEVER&COUGH
    "CARDFT",  # CARDIAC PATIENT FEVER/TRAVEL CONDITION
    "CHOKFC",  # CHOKING FEVER&COUGH
    "CHOKFT",  # CHOKING PATIENT FEVER/TRAVEL
    "CVACFC",  # STROKE CRITICAL - FEVER&COUGH
    "CVACFT",  # STROKE CRITICAL FEVER/TRAVEL
    "CVAFC",   # STROKE - FEVER & COUGH
    "CVAFT",   # STROKE FEVER/TRAVEL
    "DIFFBC",  # DIFFBC
    "DIFFBR",  # DIFFICULT BREATHER
    "DIFFFC",  # DIFF BREATHING - FEVER&COUGH
    "DIFFFT",  # DIFFICULT BREATHING FEVER/TRAVEL
    "DIFFRF",  # DIFFICULT BREATHER RF
    "DRUGFC",  # HX DRUG OR ALCHL ABUSE-FEV&COU
    "FEVER",   # FEVER
    "INBLFC",  # INTERNAL BLEEDING-FEVER&COUGH
    "INBLFT",  # INTERNAL BLEEDING FEVER/TRAVEL
    "MEDRFC",  # REACTION TO MED - FEVER&COUGH
    "PEDFC",   # SICK PED<5 YRS-FEVER & COUGH
    "PEDFT",   # PEDIATRIC FEVER/TRAVEL
    "PEDRF",   # SICK PED<5 YRS-RASH & FEVER
    "RESPFC",  # RESP DISTRESS - FEVER&COUGH
    "RESPFT",  # RESPIRATORY DISTRESS FEVER/TRAVEL
    "RESPIR",  # RESPIRATORY DISTRESS
    "SEIZFC",  # SEIZURES - FEVER & COUGH
    "SEIZFT",  # SEIZURE PATIENT FEVER/TRAVEL
    "SICKFC",  # SICK - COUGH & FEVER
    "SICKFT",  # SICK PATIENT FEVER/TRAVEL
    "SICKRF",  # SICK - RASH AND FEVER
    "SICPED",  # SICK PEDIATRIC, <5 YEAR OLD
    "SOB",     # SHORTNESS OF BREATH
    "SOBC",    # SOBC
    "STATFC",  # MULT OR PROLONG SEIZUR-FEV&COU
    "STATFT",  # STATUS EPILEPTICUS FEVER/TRAVEL
    "UNCFC",   # UNC PATIENT - FEVER & COUGH
    "UNCFT",   # UNCONSCIOUS FEVER/TRAVEL PATIENT
    "UNCRF"    # UNCONSCIOUS PATIENT-RASH&FEVER
]

In [108]:
## length of possiblecovid list
len(related_to_covid)

52

In [None]:
##

In [110]:
df_head.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 31 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   CAD_INCIDENT_ID                 12 non-null     int64  
 1   INCIDENT_DATETIME               12 non-null     object 
 2   INITIAL_CALL_TYPE               12 non-null     object 
 3   INITIAL_SEVERITY_LEVEL_CODE     12 non-null     int64  
 4   FINAL_CALL_TYPE                 12 non-null     object 
 5   FINAL_SEVERITY_LEVEL_CODE       12 non-null     int64  
 6   FIRST_ASSIGNMENT_DATETIME       10 non-null     object 
 7   VALID_DISPATCH_RSPNS_TIME_INDC  12 non-null     object 
 8   DISPATCH_RESPONSE_SECONDS_QY    12 non-null     int64  
 9   FIRST_ACTIVATION_DATETIME       10 non-null     object 
 10  FIRST_ON_SCENE_DATETIME         10 non-null     object 
 11  VALID_INCIDENT_RSPNS_TIME_INDC  12 non-null     object 
 12  INCIDENT_RESPONSE_SECONDS_QY    10 non

In [112]:
## we have to save each one as a file
row_size = 500_000
df_filtered = []

for i, a_chunk in enumerate(pd.read_csv("EMS_Incident_Dispatch_Data_20240407.csv", 
                                        chunksize = row_size, low_memory= False), start = 1):
    print(f"Analyzing chunk no. {i}")
    df_filtered.append(a_chunk.query("FINAL_CALL_TYPE in @related_to_covid"))

print("Your filtered df is ready to analyze")
    

Analyzing chunk no. 1
Analyzing chunk no. 2
Analyzing chunk no. 3
Analyzing chunk no. 4
Analyzing chunk no. 5
Analyzing chunk no. 6
Analyzing chunk no. 7
Analyzing chunk no. 8
Analyzing chunk no. 9
Analyzing chunk no. 10
Analyzing chunk no. 11
Analyzing chunk no. 12
Analyzing chunk no. 13
Analyzing chunk no. 14
Analyzing chunk no. 15
Analyzing chunk no. 16
Analyzing chunk no. 17
Analyzing chunk no. 18
Analyzing chunk no. 19
Analyzing chunk no. 20
Analyzing chunk no. 21
Analyzing chunk no. 22
Analyzing chunk no. 23
Analyzing chunk no. 24
Analyzing chunk no. 25
Analyzing chunk no. 26
Analyzing chunk no. 27
Analyzing chunk no. 28
Analyzing chunk no. 29
Analyzing chunk no. 30
Analyzing chunk no. 31
Analyzing chunk no. 32
Analyzing chunk no. 33
Analyzing chunk no. 34
Analyzing chunk no. 35
Analyzing chunk no. 36
Analyzing chunk no. 37
Analyzing chunk no. 38
Analyzing chunk no. 39
Analyzing chunk no. 40
Analyzing chunk no. 41
Analyzing chunk no. 42
Analyzing chunk no. 43
Analyzing chunk no. 

In [114]:
## What is df_filtered?

type(df_filtered)

list

In [120]:
## let's see the first item in that list
df_filtered[51]

Unnamed: 0,CAD_INCIDENT_ID,INCIDENT_DATETIME,INITIAL_CALL_TYPE,INITIAL_SEVERITY_LEVEL_CODE,FINAL_CALL_TYPE,FINAL_SEVERITY_LEVEL_CODE,FIRST_ASSIGNMENT_DATETIME,VALID_DISPATCH_RSPNS_TIME_INDC,DISPATCH_RESPONSE_SECONDS_QY,FIRST_ACTIVATION_DATETIME,...,ZIPCODE,POLICEPRECINCT,CITYCOUNCILDISTRICT,COMMUNITYDISTRICT,COMMUNITYSCHOOLDISTRICT,CONGRESSIONALDISTRICT,REOPEN_INDICATOR,SPECIAL_EVENT_INDICATOR,STANDBY_INDICATOR,TRANSFER_INDICATOR
25500032,232583708,09/15/2023 06:24:57 PM,SICPED,4,SICPED,4,09/15/2023 06:25:01 PM,Y,4,09/15/2023 06:25:16 PM,...,10457.0,46.0,15.0,205.0,9.0,15.0,N,N,N,N
25500039,232583715,09/15/2023 06:26:45 PM,DIFFBR,2,DIFFBR,2,09/15/2023 06:27:49 PM,Y,64,09/15/2023 06:29:26 PM,...,11217.0,78.0,35.0,302.0,13.0,9.0,N,N,N,N
25500051,232583729,09/15/2023 06:31:03 PM,DIFFBR,2,DIFFBR,2,09/15/2023 06:31:20 PM,Y,17,09/15/2023 06:31:40 PM,...,11239.0,75.0,42.0,305.0,19.0,8.0,N,N,N,N
25500054,232583732,09/15/2023 06:31:12 PM,DIFFFC,2,DIFFFC,2,09/15/2023 06:33:46 PM,Y,154,09/15/2023 06:34:08 PM,...,11217.0,78.0,39.0,302.0,15.0,9.0,N,N,N,N
25500062,232583741,09/15/2023 06:32:51 PM,DIFFBR,2,DIFFBR,2,09/15/2023 06:33:03 PM,Y,12,09/15/2023 06:33:22 PM,...,11361.0,111.0,19.0,411.0,26.0,6.0,N,N,N,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25984627,213414654,12/07/2021 11:52:46 PM,DIFFBR,2,DIFFBR,2,12/07/2021 11:52:53 PM,Y,7,12/07/2021 11:53:19 PM,...,11209.0,68.0,43.0,310.0,20.0,11.0,N,N,N,N
25984636,213414664,12/07/2021 11:54:45 PM,SICKFC,6,SICKFC,6,12/07/2021 11:55:01 PM,Y,16,12/07/2021 11:55:11 PM,...,11432.0,103.0,27.0,412.0,28.0,5.0,N,N,N,N
25984640,213414669,12/07/2021 11:57:53 PM,SICKFC,6,SICKFC,6,12/07/2021 11:58:21 PM,Y,28,12/07/2021 11:58:44 PM,...,11434.0,113.0,28.0,412.0,28.0,5.0,N,N,N,N
25984641,213414670,12/07/2021 11:59:15 PM,DIFFBR,2,DIFFBR,2,12/07/2021 11:59:39 PM,Y,24,12/08/2021 12:00:07 AM,...,10466.0,47.0,12.0,212.0,11.0,16.0,N,N,N,N


In [122]:
## concat into one df
df = pd.concat(df_filtered, ignore_index = True )


In [124]:
# see df
df

Unnamed: 0,CAD_INCIDENT_ID,INCIDENT_DATETIME,INITIAL_CALL_TYPE,INITIAL_SEVERITY_LEVEL_CODE,FINAL_CALL_TYPE,FINAL_SEVERITY_LEVEL_CODE,FIRST_ASSIGNMENT_DATETIME,VALID_DISPATCH_RSPNS_TIME_INDC,DISPATCH_RESPONSE_SECONDS_QY,FIRST_ACTIVATION_DATETIME,...,ZIPCODE,POLICEPRECINCT,CITYCOUNCILDISTRICT,COMMUNITYDISTRICT,COMMUNITYSCHOOLDISTRICT,CONGRESSIONALDISTRICT,REOPEN_INDICATOR,SPECIAL_EVENT_INDICATOR,STANDBY_INDICATOR,TRANSFER_INDICATOR
0,110010798,01/01/2011 02:20:24 AM,RESPIR,4,RESPIR,4,01/01/2011 02:23:17 AM,Y,173,01/01/2011 02:23:35 AM,...,10465.0,45.0,13.0,210.0,8.0,14.0,N,N,N,N
1,110010814,01/01/2011 02:22:34 AM,DIFFBR,2,DIFFBR,2,01/01/2011 02:24:10 AM,Y,96,01/01/2011 02:24:23 AM,...,10029.0,23.0,8.0,111.0,4.0,13.0,N,N,N,N
2,110010817,01/01/2011 02:22:52 AM,DIFFBR,2,DIFFBR,2,01/01/2011 02:23:51 AM,Y,59,01/01/2011 02:24:45 AM,...,11417.0,106.0,32.0,410.0,27.0,8.0,N,N,N,N
3,110010825,01/01/2011 02:23:57 AM,ASTHMB,2,ASTHMB,2,01/01/2011 02:24:52 AM,Y,55,01/01/2011 02:25:28 AM,...,11365.0,107.0,24.0,408.0,25.0,6.0,N,N,N,N
4,110010828,01/01/2011 02:24:45 AM,DIFFBR,2,DIFFBR,2,01/01/2011 02:27:20 AM,Y,155,01/01/2011 02:27:33 AM,...,10023.0,20.0,6.0,107.0,3.0,10.0,N,N,N,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3149561,213414654,12/07/2021 11:52:46 PM,DIFFBR,2,DIFFBR,2,12/07/2021 11:52:53 PM,Y,7,12/07/2021 11:53:19 PM,...,11209.0,68.0,43.0,310.0,20.0,11.0,N,N,N,N
3149562,213414664,12/07/2021 11:54:45 PM,SICKFC,6,SICKFC,6,12/07/2021 11:55:01 PM,Y,16,12/07/2021 11:55:11 PM,...,11432.0,103.0,27.0,412.0,28.0,5.0,N,N,N,N
3149563,213414669,12/07/2021 11:57:53 PM,SICKFC,6,SICKFC,6,12/07/2021 11:58:21 PM,Y,28,12/07/2021 11:58:44 PM,...,11434.0,113.0,28.0,412.0,28.0,5.0,N,N,N,N
3149564,213414670,12/07/2021 11:59:15 PM,DIFFBR,2,DIFFBR,2,12/07/2021 11:59:39 PM,Y,24,12/08/2021 12:00:07 AM,...,10466.0,47.0,12.0,212.0,11.0,16.0,N,N,N,N


In [53]:
## see the df info



In [54]:
## value count by borough


In [55]:
## percentage by borough


In [56]:
## iterate through all the files and pull out "final severity levels between 6 and 7 inclusive" incidents only


In [57]:
## see a sample, with random source list
