## Getting first and last date of tweets for each twitter user

The purpose of this notebook is to extract **unique user id, screen name, date user created, date of first tweet in dataset, date of last tweet** from a tweets collection (JSON) as a result table shown in Step 3 below. 

It was originally written for Program on Etremism data's request, but can be used for any collection by replacing the input file to users' own tweets collection file. 

#### 1) Setting Input file(JSON) and Output file(CSV)

In [6]:
# For users: Change the filenames as you like.

INPUTFILE = "POE_json2.json"
OUTPUTFILE = "results.csv"

#### 2) Extracting "UserID, screen name, date created" from the input data

In [2]:
# header
!echo "[]" | jq -r '["tweet_created_at","userID", "screen_name", "user_created_at"] | @csv' > "csvdata.csv"
!cat $INPUTFILE | jq -r '[(.created_at | strptime("%A %B %d %T %z %Y") | todate), .user.id_str, .user.screen_name, (.user.created_at | strptime("%A %B %d %T %z %Y") | todate)] | @csv' >> "csvdata.csv"
!head -5 "csvdata.csv"

"tweet_created_at","userID","screen_name","user_created_at"
"2017-04-10T11:04:58Z","3238710423","NageenNk","2015-05-06T11:21:39Z"
"2017-04-10T11:16:05Z","287745263","dappodan1","2011-04-25T16:06:38Z"
"2017-04-10T11:14:03Z","287745263","dappodan1","2011-04-25T16:06:38Z"
"2017-04-10T11:11:14Z","287745263","dappodan1","2011-04-25T16:06:38Z"


#### 3) Getting First_tweet_date and Last_tweet_date for each user

In [3]:
import pandas as pd              

data = pd.read_csv("csvdata.csv", encoding = 'ISO-8859-1')
data2 = data.groupby(['userID', 'screen_name', 'user_created_at']).tweet_created_at.agg(['min', 'max'])
data3 = data2.reset_index()
data3.rename(columns={'min': 'first_tweet_date', 'max': 'last_tweet_date'}, inplace=True)
data3.head(5)

Unnamed: 0,userID,screen_name,user_created_at,first_tweet_date,last_tweet_date
0,3143581,UnitedStates,2007-04-01T18:21:58Z,2016-10-15T21:58:46Z,2017-05-13T00:26:53Z
1,18671937,V_FreaKy,2009-01-06T12:21:55Z,2009-01-06T12:24:33Z,2015-12-07T07:49:36Z
2,37378504,almanialkelli,2009-05-03T06:26:47Z,2009-05-03T06:27:25Z,2017-04-25T18:19:06Z
3,48733347,Antizionism,2009-06-19T15:23:04Z,2009-06-19T16:27:58Z,2010-10-29T22:25:47Z
4,57914577,ShamiWitness,2009-07-18T11:43:46Z,2014-11-20T18:50:52Z,2014-12-11T17:00:39Z


In [4]:
# the number of unique users
len(data3)

911

#### 4) Export the results to a csv file

In [5]:
# Export the results to a csv file whose filename is OUTPUTFILE set by user in the beginning of thie notebook.
data3.to_csv(OUTPUTFILE, index=False)