# COGS 108 - Data Checkpoint

# Names

- James Larsen
- Alejandro Servin
- Lily Steiner
- Mayra Trejo
- Lucy Lennemann

<a id='research_question'></a>
# Research Question

How has the sentiment of the language surrounding Deafness used by popular online news sources (ABC, New York Times, USA Today, The Guardian, Alternative Press) changed since the 80s?

# Dataset(s)

We got our datasets by scraping public APIs for news sources. We queried the APIs for articles related to deafness. The APIs would return a list of URLs for articles related to our search. We then would scrape the article text and other relevant information from the URL's website.

__AP News Articles, ABC News Articles__
- datasets/ap_data.json and datasets/abc_data.json
- These datasets were made by getting article URLs from the Google Custom Search API and then scraping the articles from their news sites
    - https://developers.google.com/custom-search/v1/introduction
- 380 AP articles, 160 ABC articles

__New York Times Articles__
- datasets/nyt_data.json
- This dataset was made by getting article URLs from the NYT API and then scraping the articles from the NYT site
    - https://developer.nytimes.com/
- 750 articles

__The Guardian Articles__
- datasets/guard_data.json
- This dataset was made by getting article URLs from The Guardian API and then scraping the articles from The Guardian website
    - https://open-platform.theguardian.com/
- 7000 articles

__USA Today Articles__
- datasets/usa_data.json
- This dataset is partially complete, and may or may not end up used in the final project. It was made using the Google Custom Search API
    - https://developers.google.com/custom-search/v1/introduction
- Over 1000 total articles


These datasets should be easy to combine due to us collecting the same information for each article. The information we collected was the news source, the URL, the headline, the publishing date, and the article text.

# Setup

In [1]:
#import necessary packages, some will be used during analysis
import sys
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import json
import nltk
from textblob import TextBlob
from datetime import date


# import datasets example
#with open('') as ds:
#    data=json.load(ds)

In [2]:
# Import Datasets
# Import ABC Dataset
with open('dataset/abc_data.json') as abc_ds:
    abc_data=json.load(abc_ds)
    
# Import Alternative Press Dataset
with open('dataset/ap_data.json') as ap_ds:
    ap_data=json.load(ap_ds)

# Import The Guardian Dataset
with open('dataset/guard_data.json') as guard_ds:
    guard_data=json.load(guard_ds)
    
# Import New York Times Dataset
with open('dataset/nyt_data.json') as nyt_ds:
    nyt_data=json.load(nyt_ds)

#USA Today dataset not yet ready for import, need more time to scrape due to limited API 
#with open('dataset/usa_data.json') as usa_ds:
#    usa_data=json.load(usa_ds)

In [3]:
# Convert datasets to dataforms
abc_df = pd.read_json('dataset/abc_data.json')
ap_df = pd.read_json('dataset/ap_data.json') 
guard_df = pd.read_json('dataset/guard_data.json')
nyt_df = pd.read_json('dataset/nyt_data.json') 
#usa_df=pd.read_json('dataset/usa_data.json') 


In [4]:
# Set row and column display
pd.options.display.max_rows=6
pd.options.display.max_columns=5

#Used to look for text errors reverted for cleaning
#pd.options.display.max_colwidth=None 

pd.options.display.max_colwidth=40

In [5]:
# List of dataframes for function iteration (not sure how though)
#dont forget to add usa_df later
df_list = [abc_df, ap_df, guard_df, nyt_df]


# Data Cleaning

Describe your data cleaning steps here.

### ABC Dataset

In [6]:
#visualize dataframe
abc_df                                      

Unnamed: 0,url,headline,source,date,text
0,https://abcnews.go.com/US/wireStory/...,Prosecutor: Alex Murdaugh now faces ...,ABC News,2022-01-21 18:17:00,"COLUMBIA, S.C. -- A once-prominent S..."
1,https://abcnews.go.com/US/undefeated...,Undefeated: Deaf football team bring...,ABC News,2021-11-20 12:59:00,"Once considered underdogs, the footb..."
2,https://abcnews.go.com/US/referee-ac...,Referee accused of discriminating ag...,ABC News,2021-12-30 02:53:00,The American Civil Liberties Union i...
...,...,...,...,...,...
158,https://abcnews.go.com/US/dunwoody-d...,Dunwoody Day Care Trial: Widow 'Didn...,ABC News,2012-02-24 16:02:00,"ATLANTA, Feb. 24, 2012 — -- A witnes..."
159,https://abcnews.go.com/US/story?id=9...,Police Investigate Deaf Student Homi...,ABC News,2006-01-07 15:05:00,"Feb. 4, 2001 -- A student found dead..."
160,https://abcnews.go.com/US/story?id=9...,Rush Limbaugh Suffers Hearing Loss -...,ABC News,2006-01-07 15:26:00,"Oct. 8, 2001 -- Rush Limbaugh, who's..."


In [7]:
# Reorganize columns
abc_df = abc_df[['headline','date','source','url','text']]

# Convert 'date' to datetime format and only visualize date
abc_df['date']=pd.to_datetime(abc_df['date'], errors='coerce')

# Remove articles before 1980-01-01
abc_df[~(abc_df['date']<='1980-01-01')]

# Drop 'source' column for easier visualization
abc_df.drop(columns=['source'])


Unnamed: 0,headline,date,url,text
0,Prosecutor: Alex Murdaugh now faces ...,2022-01-21 18:17:00,https://abcnews.go.com/US/wireStory/...,"COLUMBIA, S.C. -- A once-prominent S..."
1,Undefeated: Deaf football team bring...,2021-11-20 12:59:00,https://abcnews.go.com/US/undefeated...,"Once considered underdogs, the footb..."
2,Referee accused of discriminating ag...,2021-12-30 02:53:00,https://abcnews.go.com/US/referee-ac...,The American Civil Liberties Union i...
...,...,...,...,...
158,Dunwoody Day Care Trial: Widow 'Didn...,2012-02-24 16:02:00,https://abcnews.go.com/US/dunwoody-d...,"ATLANTA, Feb. 24, 2012 — -- A witnes..."
159,Police Investigate Deaf Student Homi...,2006-01-07 15:05:00,https://abcnews.go.com/US/story?id=9...,"Feb. 4, 2001 -- A student found dead..."
160,Rush Limbaugh Suffers Hearing Loss -...,2006-01-07 15:26:00,https://abcnews.go.com/US/story?id=9...,"Oct. 8, 2001 -- Rush Limbaugh, who's..."


In [8]:
#look for null values
abc_df.isnull().sum()

headline    0
date        0
source      0
url         0
text        0
dtype: int64

In [9]:
#Comb for unique values in the 'headline' column
abc_df['headline'].unique()

array(['Prosecutor: Alex Murdaugh now faces 71 charges; $8.5M stolen ...',
       'Undefeated: Deaf football team brings triumph and pride to ...',
       'Referee accused of discriminating against deaf wrestler in state ...',
       'Preserving Black American Sign Language in the Deaf community ...',
       'Baby born deaf has touching reaction to hearing music for 1st time ...',
       "Deaf Costco worker with mumbling manager won't get award - ABC ...",
       'Today in History - ABC News',
       "Scenes from Week 1 of Ghislaine Maxwell's sex-abuse trial - ABC ...",
       'Police officer dies from COVID-19 just 3 months after retirement ...',
       'Man released from prison after 48 years in court compromise - ABC ...',
       'Liberty Univ associate professor charged with sexual battery - ABC ...',
       'Report Warns of Terror Unpreparedness - ABC News',
       "Epstein's former house manager testifies, calls Ghislaine Maxwell ...",
       "Nobel doctor calls sexual violence i

In [10]:
#Comb text for unique values in the 'text' column
abc_df['text'].unique()

array(["COLUMBIA, S.C. -- A once-prominent South Carolina lawyer now faces 71 charges that he stole nearly $8.5 million in wrongful death and wreck settlements from more than a dozen people after another round of indictments against         Alex Murdaugh were handed up Friday.The 23 new charges issued by the state grand jury covered new victims but similar schemes, prosecutors said.Murdaugh, 53, would negotiate settlement money for his clients without telling them what they earned, then deposit the checks meant to pay for their pain and suffering or the anguish of the death of a loved one into his own personal accounts — paying off loans or debts or in ways prosecutors have not detailed.The new indictments extend Murdaugh's crimes back more than a decade to 2011 and add a new mystery. Several of them said Murdaugh used money orders given to an unnamed family member to get his hands on the cash, prosecutors said.Murdaugh has been in jail since October for the ever-growing list of breach

In [11]:
#Clean text


### Alternative Press Dataset

In [12]:
#visualize dataframe
ap_df                                      

Unnamed: 0,url,headline,date,source,text
0,https://apnews.com/article/lifestyle...,2 hurt when part of student center c...,2022-01-13 12:53:56+00:00,AP News,"TALLADEGA, Ala. (AP) — Two workers w..."
1,https://apnews.com/article/georgia-u...,Georgia hospital agrees to measures ...,2022-01-09 15:58:08+00:00,AP News,"CALHOUN, Ga. (AP) — Federal authorit..."
2,https://apnews.com/article/sports-ba...,Wednesday's Scores | AP News,2022-01-27 05:20:44+00:00,AP News,"BOYS PREP BASKETBALL=Ash Fork 50, Gr..."
...,...,...,...,...,...
374,https://apnews.com/article/7bd22eafb...,Blind Lobby for Bill to Ban Seating ...,1990-02-07 07:41:00+00:00,AP News,\t WASHINGTON (AP) _ Scores of bli...
375,https://apnews.com/article/048cc469c...,Judaism in Silence: New Sign Languag...,1985-05-07 18:06:00+00:00,AP News,"\t NEWARK, N.J. (AP) _ Naomi Mille..."
376,https://apnews.com/article/03d6ff92f...,No One Took Girl's Threats Seriously...,1985-12-04 19:16:00+00:00,AP News,"\t SPANAWAY, Wash. (AP) _ Danny Ga..."


In [13]:
# Reorganize columns
ap_df = ap_df[['headline','date','source','url','text']]

# Convert 'date' to datetime format and only visualize date
ap_df['date']=pd.to_datetime(ap_df['date'])

#Remove articles before 1980-01-01
ap_df[~(ap_df['date']<='1980-01-01')]

# Drop 'source' column for easier visualization
ap_df.drop(columns=['source'])


Unnamed: 0,headline,date,url,text
0,2 hurt when part of student center c...,2022-01-13 12:53:56+00:00,https://apnews.com/article/lifestyle...,"TALLADEGA, Ala. (AP) — Two workers w..."
1,Georgia hospital agrees to measures ...,2022-01-09 15:58:08+00:00,https://apnews.com/article/georgia-u...,"CALHOUN, Ga. (AP) — Federal authorit..."
2,Wednesday's Scores | AP News,2022-01-27 05:20:44+00:00,https://apnews.com/article/sports-ba...,"BOYS PREP BASKETBALL=Ash Fork 50, Gr..."
...,...,...,...,...
374,Blind Lobby for Bill to Ban Seating ...,1990-02-07 07:41:00+00:00,https://apnews.com/article/7bd22eafb...,\t WASHINGTON (AP) _ Scores of bli...
375,Judaism in Silence: New Sign Languag...,1985-05-07 18:06:00+00:00,https://apnews.com/article/048cc469c...,"\t NEWARK, N.J. (AP) _ Naomi Mille..."
376,No One Took Girl's Threats Seriously...,1985-12-04 19:16:00+00:00,https://apnews.com/article/03d6ff92f...,"\t SPANAWAY, Wash. (AP) _ Danny Ga..."


In [14]:
# Look for null values
ap_df.isnull().sum()

headline    0
date        0
source      0
url         0
text        0
dtype: int64

In [15]:
#Comb for unique values in the 'headline' column
ap_df['headline'].unique()

array(['2 hurt when part of student center collapses at deaf school | AP News',
       'Georgia hospital agrees to measures to help deaf patients | AP News',
       "Wednesday's Scores | AP News",
       'New NYC mayor says kids safe in school despite virus surge | AP ...',
       "Tuesday's Scores | AP News", "Monday's Scores | AP News",
       "Saturday's Scores | AP News", "Thursday's Scores | AP News",
       "Friday's Scores | AP News",
       'Mickey Guyton, Jhené Aiko, Mary Mary to sing at Super Bowl | AP ...',
       'Omicron surge is undermining care for other health problems | AP ...',
       'West Virginia lawmakers introduce 15-week abortion ban | AP News',
       'Ómicron trastoca el regreso a las escuelas en EEUU | AP News',
       'Prosecutor: Alex Murdaugh now faces 71 charges; $8.5M stolen | AP ...',
       'What to watch out for when Oscar noms are announced Tuesday ...',
       'New this week: Mary J. Blige, Jennifer Lopez and Puppy Bowl | AP ...',
       '2022 SAG n

In [16]:
#Comb for unique values in the 'text' column
ap_df['text'].unique()

array(['TALLADEGA, Ala. (AP) — Two workers were injured when part of a building collapsed at the Alabama School for the Deaf, raining down bricks and other material, authorities said. The failure happened Wednesday morning at the student center of the school, which is part of the Alabama Institute for Deaf and Blind, according to a statement from Talladega Fire and Rescue released on social media. The weight of a lift collapsed the floor below an area where people were working on the building, the statement said. The two who were injured weren’t trapped and were taken for treatment by ambulance, but further information about their condition wasn’t immediately available.The school said no students were in the area at the time of the accident, and school workers including a nurse and security staff assisted afterward. “The building structure is being evaluated to determine if there are ongoing safety issues. The building is currently secured and all affected areas have been blocked to pr

In [36]:
#Remove articles that report sports scores
ap_df[ap_df['headline'].str.contains("Monday's Scores|Tuesday's Scores|Wednesday's Scores|Thursday's Scores|Friday's Scores|Saturday's Scores|Sunday's Scores")==False]

#Clean text

Unnamed: 0,headline,date,source,url,text
0,2 hurt when part of student center c...,2022-01-13 12:53:56+00:00,AP News,https://apnews.com/article/lifestyle...,"TALLADEGA, Ala. (AP) — Two workers w..."
1,Georgia hospital agrees to measures ...,2022-01-09 15:58:08+00:00,AP News,https://apnews.com/article/georgia-u...,"CALHOUN, Ga. (AP) — Federal authorit..."
3,New NYC mayor says kids safe in scho...,2022-01-03 17:06:08+00:00,AP News,https://apnews.com/article/coronavir...,NEW YORK (AP) — New York City school...
...,...,...,...,...,...
374,Blind Lobby for Bill to Ban Seating ...,1990-02-07 07:41:00+00:00,AP News,https://apnews.com/article/7bd22eafb...,\t WASHINGTON (AP) _ Scores of bli...
375,Judaism in Silence: New Sign Languag...,1985-05-07 18:06:00+00:00,AP News,https://apnews.com/article/048cc469c...,"\t NEWARK, N.J. (AP) _ Naomi Mille..."
376,No One Took Girl's Threats Seriously...,1985-12-04 19:16:00+00:00,AP News,https://apnews.com/article/03d6ff92f...,"\t SPANAWAY, Wash. (AP) _ Danny Ga..."


### The Guardian Dataset

In [17]:
#visualize dataframe
guard_df                                        

Unnamed: 0,url,date,source,headline,text
0,https://www.theguardian.com/society/...,2022-01-27 19:51:16+00:00,The Guardian,British Sign Language to become reco...,British Sign Language (BSL) is on co...
1,https://www.theguardian.com/society/...,2021-12-09 18:10:02+00:00,The Guardian,Scottish health board apologises ove...,A Scottish health board has apologis...
2,https://www.theguardian.com/tv-and-r...,2022-01-10 17:36:39+00:00,The Guardian,Strictly: sign language interpreter ...,She was the first deaf contestant an...
...,...,...,...,...,...
6838,https://www.theguardian.com/theguard...,1954-11-12 15:10:17+00:00,The Guardian,Pensioners demand £2 10s a week - fr...,About four thousand old age pensione...
6839,https://www.theguardian.com/theobser...,1932-05-22 13:38:00+00:00,The Guardian,First woman to fly the Atlantic,"Miss Amelia Earhart, the American fl..."
6840,https://www.theguardian.com/world/18...,1865-02-07 02:35:09+00:00,The Guardian,Beethoven conducts Fidelio,Extracts from Louis Spohr's autobiog...


In [18]:
# Reorganize columns
guard_df = guard_df[['headline','date','source','url','text']]

# Convert 'date' to datetime format and only visualize date
guard_df['date']=pd.to_datetime(guard_df['date'])

#Remove articles before 1980-01-01
guard_df[~(guard_df['date']<='1980-01-01')]

# Drop 'source' column for easier visualization
guard_df.drop(columns=['source'])


Unnamed: 0,headline,date,url,text
0,British Sign Language to become reco...,2022-01-27 19:51:16+00:00,https://www.theguardian.com/society/...,British Sign Language (BSL) is on co...
1,Scottish health board apologises ove...,2021-12-09 18:10:02+00:00,https://www.theguardian.com/society/...,A Scottish health board has apologis...
2,Strictly: sign language interpreter ...,2022-01-10 17:36:39+00:00,https://www.theguardian.com/tv-and-r...,She was the first deaf contestant an...
...,...,...,...,...
6838,Pensioners demand £2 10s a week - fr...,1954-11-12 15:10:17+00:00,https://www.theguardian.com/theguard...,About four thousand old age pensione...
6839,First woman to fly the Atlantic,1932-05-22 13:38:00+00:00,https://www.theguardian.com/theobser...,"Miss Amelia Earhart, the American fl..."
6840,Beethoven conducts Fidelio,1865-02-07 02:35:09+00:00,https://www.theguardian.com/world/18...,Extracts from Louis Spohr's autobiog...


In [19]:
# Look for null values
guard_df.isnull().sum()

headline    0
date        0
source      0
url         0
text        0
dtype: int64

In [20]:
#Comb for unique values in the 'headline' column
guard_df['headline'].unique()

array(['British Sign Language to become recognised language in the UK ',
       'Scottish health board apologises over late diagnosis of deaf children',
       'Strictly: sign language interpreter to be projected on to big screens at live shows',
       ..., 'Pensioners demand £2 10s a week - from taxation',
       'First woman to fly the Atlantic', 'Beethoven conducts  Fidelio'],
      dtype=object)

In [21]:
#Comb for unique values in the 'text' column
guard_df['text'].unique()

array(['British Sign Language (BSL) is on course to become a recognised language, after the government backed a proposal by a Labour MP.The private member’s bill, introduced by Rosie Cooper, aims to improve accessibility for deaf people and would see the promotion of BSL when making public service announcements.It would also see the launch of an advisory board of BSL users to offer guidance to the Department for Work and Pensions (DWP) on how and when to use it and look at increasing the number of BSL interpreters.It will encourage government departments and public bodies to follow the guidance, giving deaf people “equal access to education, employment, public services such as the NHS”, according to the British Deaf Association (BDA).DWP minister Chloe Smith said: “Effective communication is vital to creating a more inclusive and accessible society, and legally recognising British Sign Language in Great Britain is a significant step towards ensuring that deaf people are not excluded fr

In [22]:
#Clean text

### New York Times Dataset

In [23]:
#visualize dataframe
nyt_df                                       

Unnamed: 0,url,headline,date,source,text
0,https://www.nytimes.com/2021/11/19/p...,How the Beatles Broke Up and the Dea...,2021-11-19 10:30:02+00:00,The New York Times,"This weekend, listen to a collection..."
1,https://www.nytimes.com/2021/10/10/o...,Don’t Fear a Deafer Planet,2021-10-10 15:00:07+00:00,The New York Times,"In Deaf culture, we have a rich stor..."
2,https://www.nytimes.com/2021/10/01/s...,"R. Allen Gardner, 91, Dies; Taught S...",2021-10-01 16:37:51+00:00,The New York Times,Washoe was 10 months old when her fo...
...,...,...,...,...,...
744,https://www.nytimes.com/1985/07/29/o...,Where Chimpanzees Use Sign Language,1985-07-29 05:00:00+00:00,The New York Times,Credit...The New York Times Archives...
745,https://www.nytimes.com/1984/02/06/o...,FIRE ALARMS FOR THE DEAF,1984-02-06 05:00:00+00:00,The New York Times,Credit...The New York Times Archives...
746,https://www.nytimes.com/1982/07/22/o...,DEAF AND SAFE DRIVERS,1982-07-22 05:00:00+00:00,The New York Times,Credit...The New York Times Archives...


In [24]:
# Reorganize columns
nyt_df = nyt_df[['headline','date','source','url','text']]

# Convert 'date' to datetime format and only visualize date
nyt_df['date']=pd.to_datetime(nyt_df['date'])

# Drop 'source' column for easier visualization
nyt_df.drop(columns=['source'])

# Visualize 'text' to search for errors
#nyt_df['text']

Unnamed: 0,headline,date,url,text
0,How the Beatles Broke Up and the Dea...,2021-11-19 10:30:02+00:00,https://www.nytimes.com/2021/11/19/p...,"This weekend, listen to a collection..."
1,Don’t Fear a Deafer Planet,2021-10-10 15:00:07+00:00,https://www.nytimes.com/2021/10/10/o...,"In Deaf culture, we have a rich stor..."
2,"R. Allen Gardner, 91, Dies; Taught S...",2021-10-01 16:37:51+00:00,https://www.nytimes.com/2021/10/01/s...,Washoe was 10 months old when her fo...
...,...,...,...,...
744,Where Chimpanzees Use Sign Language,1985-07-29 05:00:00+00:00,https://www.nytimes.com/1985/07/29/o...,Credit...The New York Times Archives...
745,FIRE ALARMS FOR THE DEAF,1984-02-06 05:00:00+00:00,https://www.nytimes.com/1984/02/06/o...,Credit...The New York Times Archives...
746,DEAF AND SAFE DRIVERS,1982-07-22 05:00:00+00:00,https://www.nytimes.com/1982/07/22/o...,Credit...The New York Times Archives...


In [25]:
#Look for null values
nyt_df.isnull().sum()

headline    0
date        0
source      0
url         0
text        0
dtype: int64

In [26]:
#Comb for unique values in the 'headline' column
nyt_df['headline'].unique()

array(['How the Beatles Broke Up and the Deaf Football Team Taking California by Storm: The Week in Narrated Articles',
       'Don’t Fear a Deafer Planet',
       'R. Allen Gardner, 91, Dies; Taught Sign Language to a Chimp Named Washoe',
       'Barbara Kannapell, Activist Who Empowered Deaf People, Dies at 83',
       'Lesson of the Day: ‘Black, Deaf and Extremely Online’',
       'Black, Deaf and Extremely Online',
       'I Think Beethoven Encoded His Deafness in His Music',
       'Mothering While Deaf in a Newly Quiet World',
       'A Deaf-Blind Dishwasher Achieves His Childhood Dream: Movie Actor',
       'The Queer, Half-Deaf Actor Redefining the Idea of a Leading Man',
       'Giannis Antetokounmpo Is Called Amazing. Now in Sign Language, Too.',
       'University Denounced for Showing Sign Language for ‘Jewish’ as a Hooked Nose',
       'Harlan Lane, Vigorous Advocate for Deaf Culture, Dies at 82',
       'At Banks and Fund Firms, Access Is Too Often Denied, Blind and Deaf 

In [27]:
#Comb for unique values in teh 'text' column
nyt_df['text'].unique()

array(['This weekend, listen to a collection of narrated articles from around The New York Times, read aloud by the reporters who wrote them.Know How the Beatles Ended? Peter Jackson May Change Your Mind.Written and narrated by Ben SisarioKnow How the Beatles Ended? Peter Jackson May Change Your Mind.{"@context":"http://schema.org","@type":"AudioObject","@id":"https://static.nytimes.com/podcasts/2021/11/15/arts/14beatles-audio/211111-beatles-ended-peter-jackson-nyt-audm.mp3","description":"","name":"Know How the Beatles Ended? Peter Jackson May Change Your Mind.","contentUrl":"https://static.nytimes.com/podcasts/2021/11/15/arts/14beatles-audio/211111-beatles-ended-peter-jackson-nyt-audm.mp3","duration":"PT0.96S"}Peter Jackson’s three-part documentary “The Beatles: Get Back” explores the most contested period in the band’s history.“It’s sort of that one impossible fan dream,” Jackson said in a video interview from Wellington, New Zealand, where he has spent much of the last four years i

In [28]:
#Clean text

### USA Today Dataset

In [29]:
##create dataframe using dataset

##visualize dataframe
#usa_df                                       

In [30]:
## Reorganize columns
#usa_df = usa_df[['headline','date','source','url','text']]

## Convert 'date' to datetime format and only visualize date
#usa_df['date']=pd.to_datetime(usa_df['date'])

## Remove articles before 1980-01-01
#usa_df[~(usa_df['date']<='1980-01-01')]

## Drop 'source' column for easier visualization
#usa_df.drop(columns=['source'])

##Find data types
#usa_df.dtypes

In [31]:
##look for null values
#usa_df.isnull().sum()

In [32]:
#Comb for unique values in the 'headline' column
#usa_df['headline'].unique()

In [33]:
##Comb for unique values in the 'text' column
#usa_df['text'].unique()

In [34]:
#Clean Text