## Message extraction from multiple `.msg` files

This analysis extracts both meta data, such as sender, recipient or date, and information about the body of several emails provided through a freedom of information requests in `.msg` format. 

In [1]:
# —————— libraries that need to be installed, which you can do via pip ———————

import pandas as pd # to use pandas to process data
import extract_msg # to extract messages

# —————— libraries built into Python ———————
import csv # to write and read csv
import glob # to access file paths

In [2]:
paths = glob.glob("../data/neighbors_data/brookhaven/*.msg")

In [3]:
paths[0:5]

['../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (48).msg',
 '../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T124708.078.msg',
 '../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154502.478.msg',
 '../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154659.839.msg',
 '../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154659.187.msg']

#### Extract data of messages based and structure into data
- extract `subject`, `date`, `sender` and `body` using `extract_msg` tools
- clean body (remove unicode symbols such as `\u200c` and `t`
- split body text into a list of information that repeats in each message


We will start by looking at one message:

In [4]:
msg = extract_msg.openMsg(paths[0])
print(msg)

<extract_msg.message.Message object at 0x113e76fa0>


Now we can use various methods from `msg-extractor` to examine the message, including:
- `msg.subject`
- `msg.date`
- `msg.sender`
- `msg.to`

The next two cells test extraction methods, meaning we will try to isolate ach part of the message into individual entries.

In [5]:
msg = extract_msg.openMsg(paths[0])

print(
    msg.subject,
    msg.date, 
    msg.sender,
    msg.to,
    msg.cc

)

A Resident Posted a Crime Incident Wed, 17 Nov 2021 18:16:36 -0500 Ring Team <no-reply@neighborhoods.ring.com> andrea.serrano@brookhavenga.gov None


The following cell prints the body text to understand what information is contained in each message.

In [6]:
msg.body

'Post Titled: Stolen Package at Berkshire at Lenox Park\r\n \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c \u200c   \u200c \u2

### Cleaning the body text and turning it into data
This part takes the body of the message and structures them into categories that repeat in each message. WE do this by taking the text of the body and splitting it into a list of strings, each separated by a spacing (denoted as `\r\n`.

In [7]:
msg_body_clean = msg.body.replace("\u200c","").replace("\t", "").replace("  ","").strip()
message_items = msg_body_clean.strip().split("\r\n")

In [8]:
list(filter(None, message_items))

['Post Titled: Stolen Package at Berkshire at Lenox Park',
 ' ',
 'Neighbors Public Safety Service <https://links.neighborhoods.ring.com/ls/click?upn=FHVCVoLBYI7Dvf39yZ-2F5txav887QW1brgG-2F-2BJ99vpUd9zicH1H3TQWs2jOlo2pRKZyCz_ZtyLTlYa78bQffWNrIlGCwpMGyVBaXuuNm8hgEP81dRrNOc3xTsZxvuBgf9WVghBAMoKXX0egCKADIg4efJgHqlW8RizFbFvtUi8BTPHf1O54ekPNt9LcY0yj5Vl9uLKliYMUbwIcdYvzHiQU4gsOLIHcE7SNUIV6yNi5dxLjZs2G4vSaFgtw-2FUUKiZMfO3yITXTgvVngsq8aWMaGeycEgmR4Pp-2F7xz3f-2FZm-2FaTC7brkfyugjNsCwWnFoVQS7Y-2FKm3qu6miJoFsF56n-2F0We-2B2TbcAZzIdBOYBoXxQi7KKbmpkFVEU0Zpi9FRHp6zasFI68A3xyIT4eDGSTR3hV1sbqRZkKTGJxe3ecz4-2FhKwUrSr-2B0JTjHhvUYYSmVMal5evwmrSWXE2Rb-2BNjtzlrfnc44bCwRZzndHa-2BMk8KcWSsqbcOSR8lJsfaJofpR-2FJVqaZXtzZOwgT66EIHVlMB3F1lN3b6QN-2FmVV-2Fplx-2FzSOpLyHWrP-2B53QIBf2m4XKYkbcXmw-2FQMAc1TKORB17GhgDnbNkV8Pkp1YuuirgbKIoKevqhPgnb93UK1fXTthbLe-2Fgs8wUf7eocEDRIrC5ULf2n8N5Na-2FClbOi-2B4RmProLdA2NFB43jjSfOxQMeVSp0RyIXPlm6nPvMd8dLZ3CRHpWNWRXr8I0A-2FqL369dbeUCTQe1YS-2Bg-2BS6pMLvFgaWr8kyiG-2F>',
 'A resident in you

In [9]:
message_items_clean  = list(filter(None, message_items))
len(message_items_clean)

24

In [10]:
{
    "body_title_top"           : message_items_clean[0].replace("Post Titled: ","").strip(),
    "body_link1_title"         : message_items_clean[2].strip(),
    "body_post_classification" : message_items_clean[3].strip(),
    "body_title"               : message_items_clean[4].strip(),
    "body_date"                : message_items_clean[5].strip(),
    "body_description"         : message_items_clean[6].strip(),
    "body_link2_title"         : message_items_clean[7].strip(),
    "body_link3_title"         : message_items_clean[9].strip(),

    
    
}

{'body_title_top': 'Stolen Package at Berkshire at Lenox Park',
 'body_link1_title': 'Neighbors Public Safety Service <https://links.neighborhoods.ring.com/ls/click?upn=FHVCVoLBYI7Dvf39yZ-2F5txav887QW1brgG-2F-2BJ99vpUd9zicH1H3TQWs2jOlo2pRKZyCz_ZtyLTlYa78bQffWNrIlGCwpMGyVBaXuuNm8hgEP81dRrNOc3xTsZxvuBgf9WVghBAMoKXX0egCKADIg4efJgHqlW8RizFbFvtUi8BTPHf1O54ekPNt9LcY0yj5Vl9uLKliYMUbwIcdYvzHiQU4gsOLIHcE7SNUIV6yNi5dxLjZs2G4vSaFgtw-2FUUKiZMfO3yITXTgvVngsq8aWMaGeycEgmR4Pp-2F7xz3f-2FZm-2FaTC7brkfyugjNsCwWnFoVQS7Y-2FKm3qu6miJoFsF56n-2F0We-2B2TbcAZzIdBOYBoXxQi7KKbmpkFVEU0Zpi9FRHp6zasFI68A3xyIT4eDGSTR3hV1sbqRZkKTGJxe3ecz4-2FhKwUrSr-2B0JTjHhvUYYSmVMal5evwmrSWXE2Rb-2BNjtzlrfnc44bCwRZzndHa-2BMk8KcWSsqbcOSR8lJsfaJofpR-2FJVqaZXtzZOwgT66EIHVlMB3F1lN3b6QN-2FmVV-2Fplx-2FzSOpLyHWrP-2B53QIBf2m4XKYkbcXmw-2FQMAc1TKORB17GhgDnbNkV8Pkp1YuuirgbKIoKevqhPgnb93UK1fXTthbLe-2Fgs8wUf7eocEDRIrC5ULf2n8N5Na-2FClbOi-2B4RmProLdA2NFB43jjSfOxQMeVSp0RyIXPlm6nPvMd8dLZ3CRHpWNWRXr8I0A-2FqL369dbeUCTQe1YS-2Bg-2BS6pMLvFgaWr8kyiG-2F>',


In [11]:
msg_data = []
for path in paths: 
    print(path)
    # open file
    msg = extract_msg.openMsg(path)
    # clean the message body from tabs and other formatting and split it into a list of items based on spacing
    msg_body_clean = msg.body.replace("\u200c","").replace("\t", "").replace("  ","").strip()
    message_items = msg_body_clean.split("\r\n")
    message_items_clean  = list(filter(None, message_items))

    
    # make a data dictionary that holds all information
    msg_info={
        "subject"      : msg.subject,
        "date"         : msg.date,
        "sender"       : msg.sender,
        "to"           : msg.to,
        "cc"           : msg.cc,
        "body_title_top"           : message_items_clean[0].replace("Post Titled: ","").strip(),
        "body_link1_title"         : message_items_clean[2].strip(),
        "body_post_classification" : message_items_clean[3].strip(),
        "body_title"               : message_items_clean[4].strip(),
        "body_date"                : message_items_clean[5].strip(),
        "body_description"         : message_items_clean[6].strip(),
        "body_link2_title"         : message_items_clean[7].strip(),
        "body_link3_title"         : message_items_clean[9].strip(),
        "body_full"    : msg.body.replace("\u200c","").replace("\r\n","").replace("\t",""),
         "file_name":  path
    }
    msg_data.append(msg_info)
    


../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (48).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T124708.078.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154502.478.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154659.839.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154659.187.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154425.795.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154425.971.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154330.703.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155348.056.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154729.181.msg
../data/neighbors_data/brookhaven/A Resident Post

../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (4).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (32).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T124637.779.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154957.368.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155119.553.msg
../data/neighbors_data/brookhaven/New Crime Incident (17).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (65).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154926.644.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154330.699.msg
../data/neighbors_data/brookhaven/New Crime Incident (40).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154729.812.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime I

../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (2).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (34).msg
../data/neighbors_data/brookhaven/New Crime Incident (11).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155454.111.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154303.033.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154329.573.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (63).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155119.043.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154956.325.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (75).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155452.587.msg
../data/neighbors_data/brookhaven/A Resident 

../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (76).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155348.215.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155118.843.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154729.498.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155452.979.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T124709.438.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155118.664.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (21).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155348.599.msg
../data/neighbors_data/brookhaven/New Crime Incident (7).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T124708.602.msg
../data

../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154658.983.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155420.105.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154329.898.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154957.985.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident 6.msg
../data/neighbors_data/brookhaven/New Crime Incident.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154926.649.msg
../data/neighbors_data/brookhaven/New Crime Incident (38).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident 10.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T155120.122.msg
../data/neighbors_data/brookhaven/New Crime Incident (18).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T1545

../data/neighbors_data/brookhaven/New Crime Incident (15).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154502.475.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (30).msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154926.472.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident - 2022-07-14T154955.715.msg
../data/neighbors_data/brookhaven/A Resident Posted a Crime Incident (6).msg


In [12]:
msg_extracts = pd.DataFrame(msg_data)

msg_extracts

Unnamed: 0,subject,date,sender,to,cc,body_title_top,body_link1_title,body_post_classification,body_title,body_date,body_description,body_link2_title,body_link3_title,body_full,file_name
0,A Resident Posted a Crime Incident,"Wed, 17 Nov 2021 18:16:36 -0500",Ring Team <no-reply@neighborhoods.ring.com>,andrea.serrano@brookhavenga.gov,,Stolen Package at Berkshire at Lenox Park,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Stolen Package at Berkshire at Lenox Park,"November 17, 2021",Stolen package,Click Here to View Post <https://links.neighbo...,<https://ring.widen.net/content/yxyweylxpc/png...,Post Titled: Stolen Package at Berkshire at Le...,../data/neighbors_data/brookhaven/A Resident P...
1,A Resident Posted a Crime Incident,"Mon, 17 May 2021 08:38:51 -0400",Ring Team <no-reply@neighborhoods.ring.com>,travis.lewis@brookhavenga.gov,,Car,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Car,"May 17, 2021",2 am someone checking my car,Click Here to View Post <https://links.neighbo...,Check Out Your Feed <https://links.neighborhoo...,Post Titled: Car ...,../data/neighbors_data/brookhaven/A Resident P...
2,A Resident Posted a Crime Incident,"Thu, 20 May 2021 23:47:46 -0400","""Ring Team"" <no-reply@neighborhoods.ring.com>",andrea.serrano@brookhavenga.gov,,One or two people checking for unlocked car do...,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,One or two people checking for unlocked car do...,"May 21, 2021",Click Here to View Post <https://links.neighbo...,See the full list of crime and safety incident...,Have questions?,Post Titled: One or two people checking for un...,../data/neighbors_data/brookhaven/A Resident P...
3,A Resident Posted a Crime Incident,"Sat, 09 Oct 2021 07:09:43 -0400",Ring Team <no-reply@neighborhoods.ring.com>,robert.orange@brookhavenga.gov,,Parked Cars destroyed at Briarhill,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Parked Cars destroyed at Briarhill,"October 9, 2021",Three parked cars near the 1200 building at Br...,Click Here to View Post <https://links.neighbo...,Check Out Your Feed <https://links.neighborhoo...,Post Titled: Parked Cars destroyed at Briarhil...,../data/neighbors_data/brookhaven/A Resident P...
4,A Resident Posted a Crime Incident,"Thu, 10 Jun 2021 07:49:36 -0400","""Ring Team"" <no-reply@neighborhoods.ring.com>",travis.lewis@brookhavenga.gov,,Checking cars again in Peachtree creek townshi...,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Checking cars again in Peachtree creek townshi...,"June 10, 2021",Checking cars again in Peachtree creek townshi...,Click Here to View Post <https://links.neighbo...,Check Out Your Feed <https://links.neighborhoo...,Post Titled: Checking cars again in Peachtree ...,../data/neighbors_data/brookhaven/A Resident P...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
489,A Resident Posted a Crime Incident,"Fri, 03 Sep 2021 19:34:46 -0400",Ring Team <no-reply@neighborhoods.ring.com>,sarah.miller@brookhavenga.gov,,Package thief,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Package thief,"September 3, 2021",We had ups packages delivered 6 minutes prior ...,Click Here to View Post <https://links.neighbo...,Check Out Your Feed <https://links.neighborhoo...,Post Titled: Package thief ...,../data/neighbors_data/brookhaven/A Resident P...
490,A Resident Posted a Crime Incident,"Thu, 20 May 2021 23:47:47 -0400",Ring Team <no-reply@neighborhoods.ring.com>,travis.nguyen@brookhavenga.gov,,One or two people checking for unlocked car do...,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,One or two people checking for unlocked car do...,"May 21, 2021",Click Here to View Post <https://links.neighbo...,See the full list of crime and safety incident...,Have questions?,Post Titled: One or two people checking for un...,../data/neighbors_data/brookhaven/A Resident P...
491,A Resident Posted a Crime Incident,"Thu, 20 May 2021 23:47:57 -0400",Ring Team <no-reply@neighborhoods.ring.com>,sarah.miller@brookhavenga.gov,,One or two people checking for unlocked car do...,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,One or two people checking for unlocked car do...,"May 21, 2021",Click Here to View Post <https://links.neighbo...,See the full list of crime and safety incident...,Have questions?,Post Titled: One or two people checking for un...,../data/neighbors_data/brookhaven/A Resident P...
492,A Resident Posted a Crime Incident,"Tue, 02 Mar 2021 03:51:24 -0500","""Ring Team"" <no-reply@neighborhoods.ring.com>",carlai.moore@brookhavenga.gov,,Car break ins - Lenox Park neighborhood,Neighbors Public Safety Service <https://links...,A resident in your area just posted a crime in...,Car break ins - Lenox Park neighborhood,"March 2, 2021",3 black men - white SUV breaking into cars in ...,Click Here to View Post <https://links.neighbo...,Check Out Your Feed <https://links.neighborhoo...,Post Titled: Car break ins - Lenox Park neighb...,../data/neighbors_data/brookhaven/A Resident P...


In [13]:
msg_extracts.to_csv("../output/msg_extracts.csv", index=False)