## Objectives
- Create random names for the messages
- Create random message id's for the messages
- Format table fields to mimic database schema

## 1. Read data and get Kenyan Names 

In [43]:
import pandas as pd
from faker import Faker 
import random
import uuid
from datetime import datetime

In [44]:
df = pd.read_csv('GeneralistRails_Project_MessageData.csv')
df.head(10)

Unnamed: 0,User ID,Timestamp (UTC),Message Body
0,208,2017-02-01 19:29:05,So it means if u pay ua loan before the due da...
1,208,2017-02-01 19:21:58,The dates of payment are still indicated n no ...
2,208,2017-02-01 19:21:18,Why was my application rejected
3,208,2017-02-01 19:05:45,Hi branch I requested my number to remain the ...
4,218,2017-02-01 16:08:21,I said ill pay 5th esther camoon.. Infact you ...
5,218,2017-02-01 14:07:37,I will pay on sunday of 5th and i will pay al...
6,218,2017-02-01 12:07:07,I have a late source of salary i expected but ...
7,444,2017-02-02 15:57:36,"I will clear my loan before 15nth,kindly bear ..."
8,676,2017-02-03 14:23:45,Hi can i get the batch number
9,676,2017-02-03 14:23:25,Hi can i get the batch number pl


In [45]:
backup = df.copy()

df.shape

(100, 3)

In [46]:
# Soft reset point
df = backup.copy()
df.shape

(100, 3)

Using Faker to fake first names and conjoin them with a list of kenyan ethnic names

In [47]:
kenyanNames = pd.read_json('kenyan_names.json')
# kenyanNames.shape
kenyanNames['names'][1]

'Kiplagat'

In [48]:
fk = Faker()
fk.name().split(" ")[0]

'Krista'

## 2. Generating fake names for users

In [49]:
def generateName():
    name = fk.name().split(" ")[0] + ' ' +  kenyanNames['names'][random.randint(0,84)]
    return name

Iterating through the dataset and adding the names. 
We also check if the user id is the same so that one user id has the same name.

In [50]:
names = {}
for index, row in df.iterrows():
        user_id = row['User ID']
        if user_id not in names:
            names[user_id] = generateName()  # Generate a name if not already generated for this user ID
        df.at[index, 'Name'] = names[user_id]

df.sample(10)

Unnamed: 0,User ID,Timestamp (UTC),Message Body,Name
71,4442,2017-02-02 14:22:54,I require a feedback plz,Patrick Kiptanui
97,8392,2017-02-02 13:51:56,I been with u For long an I made amistake but ...,April Kosgei
75,4708,2017-02-02 5:47:09,I cleared last year for how long,Angela Ouma
78,5480,2017-02-03 12:28:31,"Hi branch, Yes I have a problem which I thoug...",Laurie Oluoch
87,7140,2017-02-02 13:06:55,Why cant i have a loan now yet i have cleared ...,Jennifer Obiero
27,2035,2017-02-02 5:59:11,What am i supposed to do after paying in order...,Matthew Muthoni
23,1481,2017-02-03 1:52:01,Hello. Why can't you make the loan payment opt...,Erik Kosgei
21,1354,2017-02-02 21:33:40,Thank you for the loans i have benefitted from...,Joanna Oluoch
30,2126,2017-02-01 15:52:19,And have no current loan... Im upto date ...,Jake Owiti
49,3112,2017-02-03 8:58:06,"Within aweek,specifically when plz",James Kiplagat


## 3. Generating random Message ID's for each message 

In [51]:
messageIDs = []
for index, row in df.iterrows():
    messageID = uuid.uuid4()
    messageIDs.append(messageID)

df['Message ID'] = messageIDs
df.sample(10)

Unnamed: 0,User ID,Timestamp (UTC),Message Body,Name,Message ID
86,6884,2017-02-01 19:40:52,OK I have paid all of it,Yolanda Ochieng,52a3bd79-43fa-451b-a862-2036703c993f
88,7457,2017-02-01 22:26:17,How do I get a loan,Andrew Obiero,6ab3b5c3-cbe7-401d-b89a-4b4512ce5ecd
23,1481,2017-02-03 1:52:01,Hello. Why can't you make the loan payment opt...,Erik Kosgei,0eed2257-288c-4c8c-af99-18217cf1894d
35,2126,2017-02-01 15:33:06,Why has loan been rejected?,Jake Owiti,98855882-9878-4f72-9069-90b0a8a43013
54,3112,2017-02-02 11:54:53,Can I have direct contact thus I keep untouche...,James Kiplagat,0b764fe7-e885-4477-bb3c-eeb431d96cb4
77,5297,2017-02-03 15:38:22,it can't be 1264 had paid 400 earlier pls upda...,Christina Wambui,bb6dc14f-8556-4ec3-a395-d63b0e35bb64
15,1155,2017-02-03 7:01:34,"Hello,our salaries have been delayed but hopef...",Marvin Kiplagat,8f930a26-5387-4bbb-b7f6-c606f9021a70
90,7812,2017-02-01 10:19:24,"Hi Branch,by 7th i promise to make some paymen...",Adrian Oyoo,f815fa0b-b277-4ad6-ace1-490fef90ac4a
84,6515,2017-02-02 2:11:40,The weekly text rimindance,Peter Koskei,af335e5c-c780-4258-a12f-f49fc482c76f
70,4442,2017-02-02 15:31:06,Hi! Am sure acc details are correct. Have not ...,Patrick Kiptanui,d24aee16-730c-4d9b-9105-9dd4070c9c66


## 4. Shape data to mimic database schema that is in place 
- A message in the schema should have these fields: [content, messageid, name, timestamp, userid]
- `timestamp` field should also be reformatted to be in Epoch time.

In [52]:
df = df.rename(columns={'User ID':'userid','Message ID':'messageid','Name':'name','Message Body':'content', 'Timestamp (UTC)':'timestamp'})
df.head(5)

Unnamed: 0,userid,timestamp,content,name,messageid
0,208,2017-02-01 19:29:05,So it means if u pay ua loan before the due da...,Amber Wambui,4a373a53-644b-435b-93cd-e09be4ed90cd
1,208,2017-02-01 19:21:58,The dates of payment are still indicated n no ...,Amber Wambui,faa5618f-c444-4b7c-9bdf-91e57272b728
2,208,2017-02-01 19:21:18,Why was my application rejected,Amber Wambui,0737d603-0fcd-41bd-b4e8-7f370e39bf3b
3,208,2017-02-01 19:05:45,Hi branch I requested my number to remain the ...,Amber Wambui,91917bfe-a3c9-4cbb-ba81-83398f0bd959
4,218,2017-02-01 16:08:21,I said ill pay 5th esther camoon.. Infact you ...,Jennifer Kiplimo,6e7ef7c4-700f-48f5-bb4d-352e794f7b6f


In [53]:
# Converting all timestamp fields to Epoch time
df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.strptime(x,'%Y-%m-%d %H:%M:%S').timestamp()))
df.sample(5)

Unnamed: 0,userid,timestamp,content,name,messageid
65,3897,1486097801,I've settled many of your loans before please ...,John Kimani,400a9613-15dc-472d-8a93-76667b68bdc0
38,2780,1485896755,I cant access your services,Chelsey Odera,95aba012-55f8-4da7-928e-45395bb1f6c9
9,676,1486121005,Hi can i get the batch number pl,David Rotich,4e91a1dc-561b-48a3-aa74-34cf67d93a0d
31,2126,1485953492,If there is a way u can check the mpesa sms in...,Jake Owiti,a4936f82-649a-4654-8afc-fb3c7bfc2269
81,6054,1486125963,"Hi, l have paid my loan on time but, my loan h...",Diane Kemei,9b244ab6-d43f-4e41-9d64-66e5248bd48b


## 5. Final Reformat to match Conversations Shape
` conversation = {
    agentuuid : agentid,
    messages : [{content:messageContent, senderuuid: senderid, timestamp:timestamp}],
    timestarted: timestamp, 
    username: name,
    senderuuid : userid 
}
`

In [56]:
grouped_df = df.groupby('userid').apply(lambda x: {
    'agentuuid': "",
    'messages': [{
        'content': row['content'],
        'senderuuid': row['userid'],
        'timestamp': row['timestamp']
    } for _, row in x.iterrows()],
    'senderuuid': x.iloc[0]['userid'],
    'timestarted': x.iloc[0]['timestamp'],
    'username': x.iloc[0]['name']
}).reset_index(drop=True)

grouped_df.sample(10)

  grouped_df = df.groupby('userid').apply(lambda x: {


3     {'agentuuid': '', 'messages': [{'content': 'Hi...
49    {'agentuuid': '', 'messages': [{'content': 'De...
27    {'agentuuid': '', 'messages': [{'content': 'Th...
54    {'agentuuid': '', 'messages': [{'content': 'So...
28    {'agentuuid': '', 'messages': [{'content': 'Th...
36    {'agentuuid': '', 'messages': [{'content': 'it...
33    {'agentuuid': '', 'messages': [{'content': 'Me...
35    {'agentuuid': '', 'messages': [{'content': 'Hi...
51    {'agentuuid': '', 'messages': [{'content': 'I ...
5     {'agentuuid': '', 'messages': [{'content': 'I ...
dtype: object

In [57]:
grouped_df.to_json('final.json')

And we are done, the data is now more presentable on the front end and easier to use. 