# Project: Wrangling and Analyze Data

In [2]:
import pandas as pd
import numpy as np
import missingno as msno
import requests

## Data Gathering
In the cell below, gather **all** three pieces of data for this project and load them in the notebook. **Note:** the methods required to gather each data are different.
1. Directly download the WeRateDogs Twitter archive data (twitter_archive_enhanced.csv)

In [3]:
df1 = pd.read_csv('twitter-archive-enhanced.csv',)

In [4]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [6]:
df1.describe(exclude=np.number)

Unnamed: 0,timestamp,source,text,retweeted_status_timestamp,expanded_urls,name,doggo,floofer,pupper,puppo
count,2356,2356,2356,181,2297,2356.0,2356.0,2356.0,2356.0,2356.0
unique,2356,4,2356,181,2218,957.0,2.0,2.0,2.0,2.0
top,2015-11-28 02:20:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Elliot. He's blocking the roadway. Dow...,2016-03-01 20:11:59 +0000,https://twitter.com/dog_rates/status/775733305...,,,,,
freq,1,2221,1,1,2,745.0,2259.0,2346.0,2099.0,2326.0


In [7]:
df1[df1.name=='a']

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
56,881536004380872706,,,2017-07-02 15:32:16 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a pupper approaching maximum borkdrive...,,,,https://twitter.com/dog_rates/status/881536004...,14,10,a,,,pupper,
649,792913359805018113,,,2016-10-31 02:17:31 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a perfect example of someone who has t...,,,,https://twitter.com/dog_rates/status/792913359...,13,10,a,,,,
801,772581559778025472,,,2016-09-04 23:46:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Guys this is getting so out of hand. We only r...,,,,https://twitter.com/dog_rates/status/772581559...,10,10,a,,,,
1002,747885874273214464,,,2016-06-28 20:14:22 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a mighty rare blue-tailed hammer sherk...,,,,https://twitter.com/dog_rates/status/747885874...,8,10,a,,,,
1004,747816857231626240,,,2016-06-28 15:40:07 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Viewer discretion is advised. This is a terrib...,,,,https://twitter.com/dog_rates/status/747816857...,4,10,a,,,,
1017,746872823977771008,,,2016-06-26 01:08:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a carrot. We only rate dogs. Please on...,,,,https://twitter.com/dog_rates/status/746872823...,11,10,a,,,,
1049,743222593470234624,,,2016-06-15 23:24:09 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a very rare Great Alaskan Bush Pupper....,,,,https://twitter.com/dog_rates/status/743222593...,12,10,a,,,pupper,
1193,717537687239008257,,,2016-04-06 02:21:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",People please. This is a Deadly Mediterranean ...,,,,https://twitter.com/dog_rates/status/717537687...,11,10,a,,,,
1207,715733265223708672,,,2016-04-01 02:51:22 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a taco. We only rate dogs. Please only...,,,,https://twitter.com/dog_rates/status/715733265...,10,10,a,,,,
1340,704859558691414016,,,2016-03-02 02:43:09 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a heartbreaking scene of an incredible...,,,,https://twitter.com/dog_rates/status/704859558...,10,10,a,,,pupper,


In [8]:
df1.describe()


Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.0,2356.0
mean,7.427716e+17,7.455079e+17,2.014171e+16,7.7204e+17,1.241698e+16,13.126486,10.455433
std,6.856705e+16,7.582492e+16,1.252797e+17,6.236928e+16,9.599254e+16,45.876648,6.745237
min,6.660209e+17,6.658147e+17,11856340.0,6.661041e+17,783214.0,0.0,0.0
25%,6.783989e+17,6.757419e+17,308637400.0,7.186315e+17,4196984000.0,10.0,10.0
50%,7.196279e+17,7.038708e+17,4196984000.0,7.804657e+17,4196984000.0,11.0,10.0
75%,7.993373e+17,8.257804e+17,4196984000.0,8.203146e+17,4196984000.0,12.0,10.0
max,8.924206e+17,8.862664e+17,8.405479e+17,8.87474e+17,7.874618e+17,1776.0,170.0


2. Use the Requests library to download the tweet image prediction (image_predictions.tsv)

In [9]:
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
file = 'image-predictions.tsv'

#### Downloading file using requests and pandas for comparison

In [10]:
response = requests.get(url)

In [11]:
open(file,'wb').write(response.content);

In [12]:
df2 = pd.read_csv('image-predictions.tsv',sep='\t')
df2.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.596461,True,malinois,0.138584,True,bloodhound,0.116197,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.408143,True,redbone,0.360687,True,miniature_pinscher,0.222752,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.560311,True,Rottweiler,0.243682,True,Doberman,0.154629,True


In [13]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


3. Use the Tweepy library to query additional data via the Twitter API (tweet_json.txt)

## Assessing Data
In this section, detect and document at least **eight (8) quality issues and two (2) tidiness issue**. You must use **both** visual assessment
programmatic assessement to assess the data.

**Note:** pay attention to the following key points when you access the data.

* You only want original ratings (no retweets) that have images. Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.
* Assessing and cleaning the entire dataset completely would require a lot of time, and is not necessary to practice and demonstrate your skills in data wrangling. Therefore, the requirements of this project are only to assess and clean at least 8 quality issues and at least 2 tidiness issues in this dataset.
* The fact that the rating numerators are greater than the denominators does not need to be cleaned. This [unique rating system](http://knowyourmeme.com/memes/theyre-good-dogs-brent) is a big part of the popularity of WeRateDogs.
* You do not need to gather the tweets beyond August 1st, 2017. You can, but note that you won't be able to gather the image predictions for these tweets since you don't have access to the algorithm used.



### Quality issues
1. Wrong Data types:


|Column | type |
| ----- | ----- |
| in_reply_to_status_id | float64 |
| in_reply_to_user_id | float64 |
| timestamp| object |
| retweeted_status_id |float64|
| retweeted_status_user_id | float64 |
| retweeted_status_timestamp| object | 


2. Stage of Dogs Columns with more than one classification

3. Dog names column as 'None' (string) istead of Nan (null object) for missing Data

4. Retwittes rows

5. In reply rows 

6. Error geting the rate numbers like 5 instead 13.5, 75 instead 9.75, etc

7. Tweets with "This is a ---" geting Dog name as "a" 

8.Not necessary Columns 
- source
- in_reply_to_status_id	
- in_reply_to_user_id
- retweeted_status_id          
- retweeted_status_user_id     
- retweeted_status_timestamp   



### Tidiness issues
1. Text Column with text and pictures URL 

2. Stage of dogs in colums 

## Cleaning Data
In this section, clean **all** of the issues you documented while assessing. 

**Note:** Make a copy of the original data before cleaning. Cleaning includes merging individual pieces of data according to the rules of [tidy data](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html). The result should be a high-quality and tidy master pandas DataFrame (or DataFrames, if appropriate).

In [14]:
# Make copies of original pieces of data
df1_clean = df1
df2_clean = df2

In [15]:
df1_clean.info();
df2_clean.info();

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

### Issue #1:
Wrong Data types:


|Column | type |
| ----- | ----- |
| in_reply_to_status_id | float64 |
| in_reply_to_user_id | float64 |
| timestamp| object |
| retweeted_status_id |float64|
| retweeted_status_user_id | float64 |
| retweeted_status_timestamp| object | 



#### Define:
Change Column type:

- in_reply_to_status_id	to int
- in_reply_to_user_id	to int 
- timestamp	to datatime 
- retweeted_status_id	to int
- retweeted_status_user_id	to int
- retweeted_status_timestamp	to datatime


#### Code

In [16]:
df1_clean.in_reply_to_status_id = df1_clean.in_reply_to_status_id.astype('Int64')
df1_clean.in_reply_to_user_id = df1_clean.in_reply_to_user_id.astype('Int64')
df1_clean.timestamp = df1_clean.timestamp.astype('datetime64[ns]')
df1_clean.retweeted_status_id = df1_clean.retweeted_status_id.astype('Int64')
df1_clean.retweeted_status_user_id = df1_clean.retweeted_status_user_id.astype('Int64')
df1_clean.retweeted_status_timestamp = df1_clean.retweeted_status_timestamp.astype('datetime64[ns]')

#### Test

In [17]:
df1_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null Int64
in_reply_to_user_id           78 non-null Int64
timestamp                     2356 non-null datetime64[ns]
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null Int64
retweeted_status_user_id      181 non-null Int64
retweeted_status_timestamp    181 non-null datetime64[ns]
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: Int64(4), dateti

### Issue #2:
Stage of Dogs Columns with more than one classification

#### Define

Select rows with more than 1 dog stage classification, if more than 1 delete entry.

#### Code

In [18]:
df1_clean['stg_count'] = (df1_clean.doggo!='None').astype(int)+(df1_clean.floofer!='None').astype(int)+(df1_clean.pupper!='None').astype(int)+(df1_clean.puppo!='None').astype(int) 

mask_stg = (df1_clean.stg_count>1)


In [19]:
(df1_clean.stg_count>1).sum()

14

In [20]:
df1_clean = df1_clean.drop(df1_clean[mask_stg].index,axis=0)
df1_clean = df1_clean.reset_index()

In [21]:
df1_clean['stg_count'] = (df1_clean.doggo!='None').astype(int)+(df1_clean.floofer!='None').astype(int)+(df1_clean.pupper!='None').astype(int)+(df1_clean.puppo!='None').astype(int) 
(df1_clean.stg_count>1).sum()

0

In [22]:
df1_clean.doggo=df1_clean.doggo.replace(r'None',np.nan, regex=True)
df1_clean.floofer=df1_clean.floofer.replace(r'None',np.nan , regex=True)
df1_clean.pupper=df1_clean.pupper.replace(r'None', np.nan, regex=True)
df1_clean.puppo=df1_clean.puppo.replace(r'None', np.nan, regex=True)

#### Test

In [23]:
print(df1_clean.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2342 entries, 0 to 2341
Data columns (total 19 columns):
index                         2342 non-null int64
tweet_id                      2342 non-null int64
in_reply_to_status_id         77 non-null Int64
in_reply_to_user_id           77 non-null Int64
timestamp                     2342 non-null datetime64[ns]
source                        2342 non-null object
text                          2342 non-null object
retweeted_status_id           179 non-null Int64
retweeted_status_user_id      179 non-null Int64
retweeted_status_timestamp    179 non-null datetime64[ns]
expanded_urls                 2283 non-null object
rating_numerator              2342 non-null int64
rating_denominator            2342 non-null int64
name                          2342 non-null object
doggo                         83 non-null object
floofer                       9 non-null object
pupper                        245 non-null object
puppo                         2

### Issue #3:

Missing Data at Dog names column as 'None' (string) istead of None (null object)

#### Define

Select rows with more than 1 dog stage classification, if more than 1 delete entry.

In [24]:
df1_clean.name=df1_clean.name.str.replace(r'None', '', regex=True)
df1_clean.name=df1_clean.name.replace(r'^\s*$', np.nan, regex=True)

#### Test

In [28]:
print(df1_clean.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2342 entries, 0 to 2341
Data columns (total 19 columns):
index                         2342 non-null int64
tweet_id                      2342 non-null int64
in_reply_to_status_id         77 non-null Int64
in_reply_to_user_id           77 non-null Int64
timestamp                     2342 non-null datetime64[ns]
source                        2342 non-null object
text                          2342 non-null object
retweeted_status_id           179 non-null Int64
retweeted_status_user_id      179 non-null Int64
retweeted_status_timestamp    179 non-null datetime64[ns]
expanded_urls                 2283 non-null object
rating_numerator              2342 non-null int64
rating_denominator            2342 non-null int64
name                          1605 non-null object
doggo                         83 non-null object
floofer                       9 non-null object
pupper                        245 non-null object
puppo                         2

### Issue #4:

Retwites rows


#### Define
Mask not empty "retweeted_status_id" column and Drop Rows

#### Code

In [29]:
df1_clean = df1_clean[df1_clean.retweeted_status_id.isnull()]
df1_clean  = df1_clean.reset_index(drop=True)

#### Test

In [30]:
df1_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2163 entries, 0 to 2162
Data columns (total 19 columns):
index                         2163 non-null int64
tweet_id                      2163 non-null int64
in_reply_to_status_id         77 non-null Int64
in_reply_to_user_id           77 non-null Int64
timestamp                     2163 non-null datetime64[ns]
source                        2163 non-null object
text                          2163 non-null object
retweeted_status_id           0 non-null Int64
retweeted_status_user_id      0 non-null Int64
retweeted_status_timestamp    0 non-null datetime64[ns]
expanded_urls                 2105 non-null object
rating_numerator              2163 non-null int64
rating_denominator            2163 non-null int64
name                          1490 non-null object
doggo                         75 non-null object
floofer                       9 non-null object
pupper                        224 non-null object
puppo                         24 non-

### Issue #5:

Mask not empty "In reply rows" column and Drop Rows

#### Define
Mask not empty "rin_reply_to_status_id" column and Drop Rows

#### Code


In [31]:
df1_clean = df1_clean[df1_clean.in_reply_to_status_id.isnull()]


In [32]:
df1_clean  = df1_clean.reset_index(drop=True)

#### Test

In [33]:
df1_clean.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2086 entries, 0 to 2085
Data columns (total 19 columns):
index                         2086 non-null int64
tweet_id                      2086 non-null int64
in_reply_to_status_id         0 non-null Int64
in_reply_to_user_id           0 non-null Int64
timestamp                     2086 non-null datetime64[ns]
source                        2086 non-null object
text                          2086 non-null object
retweeted_status_id           0 non-null Int64
retweeted_status_user_id      0 non-null Int64
retweeted_status_timestamp    0 non-null datetime64[ns]
expanded_urls                 2083 non-null object
rating_numerator              2086 non-null int64
rating_denominator            2086 non-null int64
name                          1489 non-null object
doggo                         72 non-null object
floofer                       9 non-null object
pupper                        221 non-null object
puppo                         23 non-nu

### Issue #6:

6. Error geting the numbers like 5 intead 13.5


#### Define

Select rows and round values.
13.5 --> 13  
9.75 --> 10 
11.27-->11 
11.26-->11 



#### Code

In [34]:
pd.options.display.max_colwidth =150
print(df1_clean.loc[df1_clean.rating_denominator<1,['text','rating_numerator']])

Empty DataFrame
Columns: [text, rating_numerator]
Index: []


In [48]:
df1_clean[df1_clean['text'].str.contains(r'\d+\.\d+\/', case=True, regex=True)]

Unnamed: 0,index,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,stg_count,dog_stage
41,45,883482846933004288,,,2017-07-08 00:28:19,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948",,,NaT,"https://twitter.com/dog_rates/status/883482846933004288/photo/1,https://twitter.com/dog_rates/status/883482846933004288/photo/1",5,10,Bella,,,,,0,
523,695,786709082849828864,,,2016-10-13 23:23:56,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS",,,NaT,https://twitter.com/dog_rates/status/786709082849828864/photo/1,75,10,Logan,,,,,0,
579,763,778027034220126208,,,2016-09-20 00:24:34,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://...,,,NaT,https://twitter.com/dog_rates/status/778027034220126208/photo/1,27,10,Sophie,,,pupper,,1,pupper
1463,1712,680494726643068929,,,2015-12-25 21:06:00,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD,,,NaT,https://twitter.com/dog_rates/status/680494726643068929/photo/1,26,10,,,,,,0,


In [59]:
df1_clean.loc[41,'rating_numerator'] = 13
df1_clean.loc[523,'rating_numerator'] = 10
df1_clean.loc[579,'rating_numerator'] = 11
df1_clean.loc[1463,'rating_numerator'] = 11


#### Test

In [60]:
df1_clean[df1_clean['text'].str.contains(r'\d+\.\d+\/', case=True, regex=True)]

Unnamed: 0,index,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo,stg_count,dog_stage
41,45,883482846933004288,,,2017-07-08 00:28:19,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948",,,NaT,"https://twitter.com/dog_rates/status/883482846933004288/photo/1,https://twitter.com/dog_rates/status/883482846933004288/photo/1",13,10,Bella,,,,,0,
523,695,786709082849828864,,,2016-10-13 23:23:56,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS",,,NaT,https://twitter.com/dog_rates/status/786709082849828864/photo/1,10,10,Logan,,,,,0,
579,763,778027034220126208,,,2016-09-20 00:24:34,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://...,,,NaT,https://twitter.com/dog_rates/status/778027034220126208/photo/1,11,10,Sophie,,,pupper,,1,pupper
1463,1712,680494726643068929,,,2015-12-25 21:06:00,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD,,,NaT,https://twitter.com/dog_rates/status/680494726643068929/photo/1,11,10,,,,,,0,


#### Define
Mask not empty "rin_reply_to_status_id" column and Drop Rows

#### Code


### Issue #7:
Tweets with "This is a ---" geting Dog name as "a" 


#### Define
Mask rows witn with name column entry equal 'a' and change for NaN

#### Code


In [35]:

# Tidiness issue


df1_clean['dog_stage'] = df1_clean.doggo.fillna('')+ df1_clean.floofer.fillna('') + df1_clean.pupper.fillna('') +df1_clean.puppo.fillna('')
df1_clean.dog_stage=df1_clean.dog_stage.replace(r'^\s*$', np.nan, regex=True)

## Storing Data
Save gathered, assessed, and cleaned master dataset to a CSV file named "twitter_archive_master.csv".

## Analyzing and Visualizing Data
In this section, analyze and visualize your wrangled data. You must produce at least **three (3) insights and one (1) visualization.**

### Insights:
1.

2.

3.

### Visualization