### Quality 
| SNo. | Issue | Table|
| :--- | :--- | :--- |
| 1. |  `twitter_archive_df`|  incorrect datatype(tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source) | 
| 2. | `twitter_archive_df` |  name, doggo, fluffer, pupper, puppo has None for NaN | 
| 3. | `twitter_archive_df` |  source contain html quotes | 
| 4. | `twitter_archive_df` | Delete invalid dog names |
| 5. | `twitter_archive_df` | Correct numerator and denominator |
| 6. | `predictions_df` | incorrect datatype(tweet_id, p1, p2, p3) | 
| 7. | `predictions_df` | remove p1_dog, p2_dog, p3_dog set as False as these are not dog types | 
| 8. | `count_df` | incorrect datatype(tweet_id) | 

### Tidiness
| SNo. | Issue | 
| :--- | :--- | 
| 1. | doggo, floofer, pupper, puppo should be in one column | 
| 2. | Combine P1, P2, and P3 into prediction_type, and cofindence level columns | 
| 3. | join all three tables |

In [182]:
import pandas as pd
import numpy as np

In [183]:
twitter_archive_df = pd.read_csv('Data/twitter-archive-enhanced.csv')
predictions_df = pd.read_csv('Data/image-predictions.tsv', sep='\t')
count_df = pd.read_csv('Data/tweet_count.csv')

In [184]:
twitter_archive_df_clean = twitter_archive_df.copy()
predictions_df_clean = predictions_df.copy()
count_df_clean = count_df.copy()

### Quality 

#### Define
1. Change the datatype for following columns:
    - tweet_id to string
    - in_reply_to_status_id to string
    - in_reply_to_user_id to string
    - timestamp to datetime
    - source to category

#### Code

In [185]:
twitter_archive_df_clean.tweet_id = twitter_archive_df_clean.tweet_id.astype(str)
twitter_archive_df_clean.in_reply_to_status_id = twitter_archive_df_clean.in_reply_to_status_id.astype(str)
twitter_archive_df_clean.in_reply_to_user_id = twitter_archive_df_clean.in_reply_to_user_id.astype(str)
twitter_archive_df_clean.source = twitter_archive_df_clean.source.astype('category')
twitter_archive_df_clean.timestamp = pd.to_datetime(twitter_archive_df_clean.timestamp)

#### Test

In [186]:
twitter_archive_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    2356 non-null   object             
 1   in_reply_to_status_id       2356 non-null   object             
 2   in_reply_to_user_id         2356 non-null   object             
 3   timestamp                   2356 non-null   datetime64[ns, UTC]
 4   source                      2356 non-null   category           
 5   text                        2356 non-null   object             
 6   retweeted_status_id         181 non-null    float64            
 7   retweeted_status_user_id    181 non-null    float64            
 8   retweeted_status_timestamp  181 non-null    object             
 9   expanded_urls               2297 non-null   object             
 10  rating_numerator            2356 non-null   int64           

#### Define
2. Replace **None** with **NaN** in following columns:
    - name
    - doggo
    - fluffer
    - pupper
    - puppo

#### Code

In [187]:
twitter_archive_df_clean.name = twitter_archive_df_clean.name.replace('None', np.nan)
twitter_archive_df_clean.doggo = twitter_archive_df_clean.doggo.replace('None', np.nan)
twitter_archive_df_clean.floofer = twitter_archive_df_clean.floofer.replace('None', np.nan)
twitter_archive_df_clean.pupper = twitter_archive_df_clean.pupper.replace('None', np.nan)
twitter_archive_df_clean.puppo = twitter_archive_df_clean.puppo.replace('None', np.nan)

#### Test

In [188]:
print(twitter_archive_df_clean[twitter_archive_df_clean.name=='None'])
print(twitter_archive_df_clean[twitter_archive_df_clean.doggo=='None'])
print(twitter_archive_df_clean[twitter_archive_df_clean.floofer=='None'])
print(twitter_archive_df_clean[twitter_archive_df_clean.pupper=='None'])
print(twitter_archive_df_clean[twitter_archive_df_clean.puppo=='None'])

Empty DataFrame
Columns: [tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, expanded_urls, rating_numerator, rating_denominator, name, doggo, floofer, pupper, puppo]
Index: []
Empty DataFrame
Columns: [tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, expanded_urls, rating_numerator, rating_denominator, name, doggo, floofer, pupper, puppo]
Index: []
Empty DataFrame
Columns: [tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, expanded_urls, rating_numerator, rating_denominator, name, doggo, floofer, pupper, puppo]
Index: []
Empty DataFrame
Columns: [tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweete

#### Define
3. Remove html tags from source column

#### Code

In [189]:
twitter_archive_df_clean.source = twitter_archive_df_clean.source.str.replace('<a href="http://twitter.com/download/iphone" rel="nofollow">','')
twitter_archive_df_clean.source = twitter_archive_df_clean.source.str.replace('<a href="http://vine.co" rel="nofollow">','')
twitter_archive_df_clean.source = twitter_archive_df_clean.source.str.replace('<a href="http://twitter.com" rel="nofollow">','')
twitter_archive_df_clean.source = twitter_archive_df_clean.source.str.replace('<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">','')
twitter_archive_df_clean.source = twitter_archive_df_clean.source.str.replace('</a>','')

#### Test

In [190]:
twitter_archive_df_clean.source.value_counts()

Twitter for iPhone     2221
Vine - Make a Scene      91
Twitter Web Client       33
TweetDeck                11
Name: source, dtype: int64

#### Define
4. Delete rows having incorrect dog names i.e. dog names starting with a lowercase and NaN/None

#### Code

In [191]:
twitter_archive_df_clean = twitter_archive_df_clean.dropna(subset=['name'])
twitter_archive_df_clean.name = twitter_archive_df_clean[twitter_archive_df_clean.name.str[0].str.isupper()].name

#### Test

In [192]:
twitter_archive_df_clean.name.value_counts()

Charlie         12
Oliver          11
Lucy            11
Cooper          11
Penny           10
                ..
Colin            1
Alexanderson     1
Cuddles          1
Schnitzel        1
Rey              1
Name: name, Length: 931, dtype: int64

In [193]:
twitter_archive_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1611 entries, 0 to 2354
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    1611 non-null   object             
 1   in_reply_to_status_id       1611 non-null   object             
 2   in_reply_to_user_id         1611 non-null   object             
 3   timestamp                   1611 non-null   datetime64[ns, UTC]
 4   source                      1611 non-null   object             
 5   text                        1611 non-null   object             
 6   retweeted_status_id         116 non-null    float64            
 7   retweeted_status_user_id    116 non-null    float64            
 8   retweeted_status_timestamp  116 non-null    object             
 9   expanded_urls               1611 non-null   object             
 10  rating_numerator            1611 non-null   int64           

#### Define
5. Fix rating_numerator and rating_denominator as 4 tweets has taken numerator from rating after decimal point

#### Code

In [194]:
value_df = twitter_archive_df_clean.loc[twitter_archive_df_clean.text.str.contains('(\d+\.\d+\/)'), 'text'].str.extract('(\d+\.\d+)')
value_df

  return func(self, *args, **kwargs)


Unnamed: 0,0
45,13.5
340,9.75
695,9.75
763,11.27


In [195]:
twitter_archive_df_clean.at[45,'rating_numerator']=float(value_df.loc[45][0])
twitter_archive_df_clean.at[340, 'rating_numerator']=float(value_df.loc[340][0])
twitter_archive_df_clean.at[695, 'rating_numerator']=float(value_df.loc[695][0])
twitter_archive_df_clean.at[763, 'rating_numerator']=float(value_df.loc[763][0])

#### Test

In [196]:
twitter_archive_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1611 entries, 0 to 2354
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    1611 non-null   object             
 1   in_reply_to_status_id       1611 non-null   object             
 2   in_reply_to_user_id         1611 non-null   object             
 3   timestamp                   1611 non-null   datetime64[ns, UTC]
 4   source                      1611 non-null   object             
 5   text                        1611 non-null   object             
 6   retweeted_status_id         116 non-null    float64            
 7   retweeted_status_user_id    116 non-null    float64            
 8   retweeted_status_timestamp  116 non-null    object             
 9   expanded_urls               1611 non-null   object             
 10  rating_numerator            1611 non-null   int64           

In [197]:
print(twitter_archive_df_clean.loc[45].rating_numerator)
print(twitter_archive_df_clean.loc[340].rating_numerator)
print(twitter_archive_df_clean.loc[695].rating_numerator)
print(twitter_archive_df_clean.loc[763].rating_numerator)

13
9
9
11


#### Define
6. Change the datatype for following columns:
    - tweet_id to string
    - p1 to category
    - p2 to category
    - p3 to category
    - p1_dog to boolean
    - p2_dog to boolean
    - p3_dog to boolean

#### Code

In [198]:
predictions_df_clean.tweet_id = predictions_df_clean.tweet_id.astype(str)
predictions_df_clean.p1 = predictions_df_clean.p1.astype('category')
predictions_df_clean.p2 = predictions_df_clean.p2.astype('category')
predictions_df_clean.p3 = predictions_df_clean.p3.astype('category')
predictions_df_clean.p1_dog = predictions_df_clean.p1_dog.astype(bool)
predictions_df_clean.p2_dog = predictions_df_clean.p2_dog.astype(bool)
predictions_df_clean.p3_dog = predictions_df_clean.p3_dog.astype(bool)

#### Test

In [199]:
predictions_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   tweet_id  2075 non-null   object  
 1   jpg_url   2075 non-null   object  
 2   img_num   2075 non-null   int64   
 3   p1        2075 non-null   category
 4   p1_conf   2075 non-null   float64 
 5   p1_dog    2075 non-null   bool    
 6   p2        2075 non-null   category
 7   p2_conf   2075 non-null   float64 
 8   p2_dog    2075 non-null   bool    
 9   p3        2075 non-null   category
 10  p3_conf   2075 non-null   float64 
 11  p3_dog    2075 non-null   bool    
dtypes: bool(3), category(3), float64(3), int64(1), object(2)
memory usage: 174.9+ KB


#### Define
7. Remove p1_dog, p2_dog, p3_dog set as False as these are not dog types

#### Code

In [200]:
predictions_df_clean = predictions_df_clean[predictions_df_clean.p1_dog]
predictions_df_clean = predictions_df_clean[predictions_df_clean.p2_dog]
predictions_df_clean = predictions_df_clean[predictions_df_clean.p3_dog]

In [201]:
predictions_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1243 entries, 0 to 2073
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   tweet_id  1243 non-null   object  
 1   jpg_url   1243 non-null   object  
 2   img_num   1243 non-null   int64   
 3   p1        1243 non-null   category
 4   p1_conf   1243 non-null   float64 
 5   p1_dog    1243 non-null   bool    
 6   p2        1243 non-null   category
 7   p2_conf   1243 non-null   float64 
 8   p2_dog    1243 non-null   bool    
 9   p3        1243 non-null   category
 10  p3_conf   1243 non-null   float64 
 11  p3_dog    1243 non-null   bool    
dtypes: bool(3), category(3), float64(3), int64(1), object(2)
memory usage: 138.2+ KB


#### Test

In [202]:
set(predictions_df_clean.p1)

{'Afghan_hound',
 'Airedale',
 'American_Staffordshire_terrier',
 'Appenzeller',
 'Australian_terrier',
 'Bedlington_terrier',
 'Bernese_mountain_dog',
 'Blenheim_spaniel',
 'Border_collie',
 'Border_terrier',
 'Boston_bull',
 'Brabancon_griffon',
 'Brittany_spaniel',
 'Cardigan',
 'Chesapeake_Bay_retriever',
 'Chihuahua',
 'Dandie_Dinmont',
 'Doberman',
 'English_setter',
 'English_springer',
 'EntleBucher',
 'Eskimo_dog',
 'French_bulldog',
 'German_shepherd',
 'German_short-haired_pointer',
 'Gordon_setter',
 'Great_Dane',
 'Great_Pyrenees',
 'Greater_Swiss_Mountain_dog',
 'Ibizan_hound',
 'Irish_setter',
 'Irish_terrier',
 'Irish_water_spaniel',
 'Italian_greyhound',
 'Japanese_spaniel',
 'Labrador_retriever',
 'Lakeland_terrier',
 'Leonberg',
 'Lhasa',
 'Maltese_dog',
 'Mexican_hairless',
 'Newfoundland',
 'Norfolk_terrier',
 'Norwegian_elkhound',
 'Norwich_terrier',
 'Old_English_sheepdog',
 'Pekinese',
 'Pembroke',
 'Pomeranian',
 'Rhodesian_ridgeback',
 'Rottweiler',
 'Saint_Be

In [203]:
set(predictions_df_clean.p2)

{'Afghan_hound',
 'Airedale',
 'American_Staffordshire_terrier',
 'Appenzeller',
 'Australian_terrier',
 'Bedlington_terrier',
 'Blenheim_spaniel',
 'Border_collie',
 'Border_terrier',
 'Boston_bull',
 'Brabancon_griffon',
 'Brittany_spaniel',
 'Cardigan',
 'Chesapeake_Bay_retriever',
 'Chihuahua',
 'Dandie_Dinmont',
 'Doberman',
 'English_foxhound',
 'English_setter',
 'English_springer',
 'EntleBucher',
 'Eskimo_dog',
 'French_bulldog',
 'German_shepherd',
 'German_short-haired_pointer',
 'Great_Dane',
 'Great_Pyrenees',
 'Greater_Swiss_Mountain_dog',
 'Ibizan_hound',
 'Irish_setter',
 'Irish_terrier',
 'Irish_wolfhound',
 'Italian_greyhound',
 'Japanese_spaniel',
 'Kerry_blue_terrier',
 'Labrador_retriever',
 'Lakeland_terrier',
 'Leonberg',
 'Lhasa',
 'Maltese_dog',
 'Mexican_hairless',
 'Newfoundland',
 'Norfolk_terrier',
 'Norwegian_elkhound',
 'Norwich_terrier',
 'Old_English_sheepdog',
 'Pekinese',
 'Pembroke',
 'Pomeranian',
 'Rhodesian_ridgeback',
 'Rottweiler',
 'Saint_Berna

In [204]:
set(predictions_df_clean.p3)

{'Afghan_hound',
 'Airedale',
 'American_Staffordshire_terrier',
 'Appenzeller',
 'Australian_terrier',
 'Bernese_mountain_dog',
 'Blenheim_spaniel',
 'Border_collie',
 'Border_terrier',
 'Boston_bull',
 'Bouvier_des_Flandres',
 'Brabancon_griffon',
 'Brittany_spaniel',
 'Cardigan',
 'Chesapeake_Bay_retriever',
 'Chihuahua',
 'Dandie_Dinmont',
 'Doberman',
 'English_foxhound',
 'English_setter',
 'English_springer',
 'EntleBucher',
 'Eskimo_dog',
 'French_bulldog',
 'German_shepherd',
 'German_short-haired_pointer',
 'Gordon_setter',
 'Great_Dane',
 'Great_Pyrenees',
 'Greater_Swiss_Mountain_dog',
 'Ibizan_hound',
 'Irish_setter',
 'Irish_terrier',
 'Irish_water_spaniel',
 'Irish_wolfhound',
 'Italian_greyhound',
 'Japanese_spaniel',
 'Kerry_blue_terrier',
 'Labrador_retriever',
 'Lakeland_terrier',
 'Leonberg',
 'Lhasa',
 'Maltese_dog',
 'Mexican_hairless',
 'Newfoundland',
 'Norfolk_terrier',
 'Norwegian_elkhound',
 'Norwich_terrier',
 'Old_English_sheepdog',
 'Pekinese',
 'Pembroke'

#### Define
8. Incorrect datatype(tweet_id)

#### Code

In [205]:
count_df_clean.tweet_id = count_df_clean.tweet_id.astype(str)

#### Test

In [206]:
count_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2354 entries, 0 to 2353
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   tweet_id        2354 non-null   object
 1   retweet_count   2354 non-null   int64 
 2   favorite_count  2354 non-null   int64 
dtypes: int64(2), object(1)
memory usage: 55.3+ KB


### Tidiness 

#### Define
1. doggo, floofer, pupper, puppo should be in one column

#### Code

In [207]:
twitter_archive_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1611 entries, 0 to 2354
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    1611 non-null   object             
 1   in_reply_to_status_id       1611 non-null   object             
 2   in_reply_to_user_id         1611 non-null   object             
 3   timestamp                   1611 non-null   datetime64[ns, UTC]
 4   source                      1611 non-null   object             
 5   text                        1611 non-null   object             
 6   retweeted_status_id         116 non-null    float64            
 7   retweeted_status_user_id    116 non-null    float64            
 8   retweeted_status_timestamp  116 non-null    object             
 9   expanded_urls               1611 non-null   object             
 10  rating_numerator            1611 non-null   int64           

In [208]:
twitter_archive_df_clean = pd.melt(twitter_archive_df_clean,
                                  id_vars=['tweet_id',
                                           'in_reply_to_status_id',
                                           'in_reply_to_user_id',
                                           'timestamp',
                                           'source',
                                           'text',
                                           'retweeted_status_id',
                                           'retweeted_status_user_id',
                                           'retweeted_status_timestamp',
                                           'expanded_urls',
                                           'rating_numerator',
                                           'rating_denominator',
                                           'name'],
                                   value_vars=['doggo', 'floofer', 'pupper', 'puppo'],
                                   var_name='dog_stages',
                                   value_name='dog_stage'
                                  )
twitter_archive_df_clean=twitter_archive_df_clean.drop('dog_stages',1)

#### Test

In [209]:
twitter_archive_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6444 entries, 0 to 6443
Data columns (total 14 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    6444 non-null   object             
 1   in_reply_to_status_id       6444 non-null   object             
 2   in_reply_to_user_id         6444 non-null   object             
 3   timestamp                   6444 non-null   datetime64[ns, UTC]
 4   source                      6444 non-null   object             
 5   text                        6444 non-null   object             
 6   retweeted_status_id         464 non-null    float64            
 7   retweeted_status_user_id    464 non-null    float64            
 8   retweeted_status_timestamp  464 non-null    object             
 9   expanded_urls               6444 non-null   object             
 10  rating_numerator            6444 non-null   int64           

In [210]:
twitter_archive_df_clean.dog_stage.value_counts()

pupper     148
doggo       53
puppo       20
floofer      5
Name: dog_stage, dtype: int64

#### Define
2. Combine P1, P2, and P3 into dog_type_prediction, and prediction_confidence columns

#### Code

In [211]:
predictions_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1243 entries, 0 to 2073
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   tweet_id  1243 non-null   object  
 1   jpg_url   1243 non-null   object  
 2   img_num   1243 non-null   int64   
 3   p1        1243 non-null   category
 4   p1_conf   1243 non-null   float64 
 5   p1_dog    1243 non-null   bool    
 6   p2        1243 non-null   category
 7   p2_conf   1243 non-null   float64 
 8   p2_dog    1243 non-null   bool    
 9   p3        1243 non-null   category
 10  p3_conf   1243 non-null   float64 
 11  p3_dog    1243 non-null   bool    
dtypes: bool(3), category(3), float64(3), int64(1), object(2)
memory usage: 138.2+ KB


In [212]:
dog_type_prediction_list = []
prediction_confidence_list = []

def combine_prediction_confidence(df):
    if df['p1_dog']:
        dog_type_prediction_list.append(df['p1'])
        prediction_confidence_list.append(df['p1_conf'])
    elif df['p2_dog']:
        dog_type_prediction_list.append(df['p2'])
        prediction_confidence_list.append(df['p2_conf'])
    elif df['p3_dog']:
        dog_type_prediction_list.append(df['p3'])
        prediction_confidence_list.append(df['p3_conf'])
    else:
        dog_type_prediction_list.append(np.nan)
        prediction_confidence_list.append(np.nan)
        
predictions_df_clean.apply(combine_prediction_confidence,axis=1)
predictions_df_clean['dog_type_prediction']=dog_type_prediction_list
predictions_df_clean['prediction_confidence']=prediction_confidence_list
predictions_df_clean=predictions_df_clean.drop(['p1','p1_conf','p1_dog','p2','p2_conf','p2_dog','p3','p3_conf','p3_dog'],axis=1)

#### Test

In [213]:
predictions_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1243 entries, 0 to 2073
Data columns (total 5 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   tweet_id               1243 non-null   object 
 1   jpg_url                1243 non-null   object 
 2   img_num                1243 non-null   int64  
 3   dog_type_prediction    1243 non-null   object 
 4   prediction_confidence  1243 non-null   float64
dtypes: float64(1), int64(1), object(3)
memory usage: 58.3+ KB


#### Define
3. Join twitter_archive_df_clean, predictions_df_clean and count_df_clean 

#### Code

In [214]:
twitter_archive_master = pd.merge(twitter_archive_df_clean, predictions_df_clean,
                                 on=['tweet_id'], how='left')
twitter_archive_master = pd.merge(twitter_archive_master, count_df_clean,
                                 on=['tweet_id'], how='left')

In [215]:
twitter_archive_master.retweet_count = twitter_archive_master.retweet_count.fillna(0)
twitter_archive_master.retweet_count = twitter_archive_master.retweet_count.astype(int)
twitter_archive_master.favorite_count = twitter_archive_master.favorite_count.fillna(0)
twitter_archive_master.favorite_count = twitter_archive_master.favorite_count.astype(int)

In [223]:
twitter_archive_master=twitter_archive_master.drop_duplicates()

#### Test

In [225]:
twitter_archive_master.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1837 entries, 0 to 5619
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype              
---  ------                      --------------  -----              
 0   tweet_id                    1837 non-null   object             
 1   in_reply_to_status_id       1837 non-null   object             
 2   in_reply_to_user_id         1837 non-null   object             
 3   timestamp                   1837 non-null   datetime64[ns, UTC]
 4   source                      1837 non-null   object             
 5   text                        1837 non-null   object             
 6   retweeted_status_id         139 non-null    float64            
 7   retweeted_status_user_id    139 non-null    float64            
 8   retweeted_status_timestamp  139 non-null    object             
 9   expanded_urls               1837 non-null   object             
 10  rating_numerator            1837 non-null   int64           

## Storing Data

In [227]:
twitter_archive_master.to_csv('Data/twitter_archive_master.csv', index=False)