<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# Whatsapp - Extract chat from whatsapp and return a dataframe

**Tags:** #python #pandas #regex #whatsapp #chats


**Author:** [Mohit Singh](https://www.linkedin.com/in/mohwits/)

**Description:** This notebook provides a step-by-step guide to extract chat from whatsapp and create a dataframe of it.

## Input

### Import libraries

In [1]:
import re
import pandas as pd

### Setup Variables

* **Chats as txt file:**
  - `First make sure your device is in 24 hrs time format, if it's not change it from your device setting and restart it.`
  - `Open whatsapp application on your device`
  - `Open the chats which you want to ectract say "work group", tap on three dot icon and then select 'more' and select 'Export chat' and choose the option 'without media'.`
  - `Give it few second, you will get option to share the file. So, either save it to your drive or choose your preferred option.`

* **sender**: `name of person or the group`

In [2]:
## Open chats file and read it in a variables
work =  open('../../Chats/work.txt', 'r', encoding='utf-8')
work = work.read()

In [3]:
## name of sender/group
sender = "work group"

## Model

In [4]:
## Regex pattern to separate date and time
pattern = '\d{1,2}/\d{1,2}/\d{2,4},\s\d{1,2}:\d{2}\s-\s'

In [5]:
## function to create dataframe
def create_dataframe(file, sender):
  ## splitting message and dates
  messages = re.split(pattern, file)[1:]
  dates = re.findall(pattern, file)

  df = pd.DataFrame({'Message': messages, 'Date':dates, 'Sender' : sender})
  ## convert messages_data type
  df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y, %H:%M - ')

  ## list for user
  users = []
  ## list for messages
  messages = []

  ## iterating in messages
  for message in df['Message']:
      ## splitting message with ':'
      entry = re.split('([\w\W]+?):\s', message)
      
      ## if it has value after ':'
      if entry[1:]: # user name
          ## appending users
          users.append(entry[1])
          ## appending message
          messages.append(entry[2][:-1])
      
      ## if it does not have a user name
      else:
          ## appending name as 'Notification'
          users.append('Notification')
          ## appending message
          messages.append(entry[0][:-1])
          
  ## dropping previous message column
  df.drop('Message', axis = 1, inplace=True)

  ## adding user column with user names
  df['User'] = users
  ## adding message column with just message
  df['Message'] = messages

  ## Separating Year, month, day, hour and minute from Date
  ## and creating specific column
  df['Year'] = df['Date'].dt.year
  df['Month'] = df['Date'].dt.month_name()
  df['Day'] = df['Date'].dt.day
  df['Hour'] = df['Date'].dt.hour
  df['Minute'] = df['Date'].dt.minute

  return df

## Output

In [6]:
## calling split_messages_and_dates function
work_df = create_dataframe(work, sender)

In [7]:
## display dataframe
work_df

Unnamed: 0,Date,Sender,User,Message,Year,Month,Day,Hour,Minute
0,2016-04-01 08:04:00,work group,Notification,"You created group ""Work""",2016,April,1,8,4
1,2016-04-01 09:00:00,work group,John,"Good morning, everyone! Just a reminder that t...",2016,April,1,9,0
2,2016-04-01 09:05:00,work group,Lisa,"Thanks for the reminder, John!",2016,April,1,9,5
3,2016-04-01 10:02:00,work group,Sarah,Are we discussing the new project during the m...,2016,April,1,10,2
4,2016-04-01 10:05:00,work group,John,"Yes, Sarah. We'll go over the details and assi...",2016,April,1,10,5
5,2016-04-01 10:10:00,work group,Mark,I won't be able to attend the meeting. Can som...,2016,April,1,10,10
6,2016-04-01 10:12:00,work group,Lisa,"Sure, Mark. I'll share the meeting notes with ...",2016,April,1,10,12
7,2016-04-02 14:30:00,work group,Sarah,"Great job on completing the project, team! Let...",2016,April,2,14,30
8,2016-06-02 14:35:00,work group,Lisa,"Sounds good, Sarah. Any suggestions for the ce...",2016,June,2,14,35
9,2016-06-02 14:40:00,work group,John,How about a team lunch at the new Italian rest...,2016,June,2,14,40
