![YouTubeNews](Images/youtube_news.png)

<h1><center>Social Media Bonanza: YouTube vs News</center></h1>
<h4><center>Authors: Blake Freeman, Jill Smith, Tomeka Morrison, Trong Nguyen</center></h4>
<p>What is the relationship between news and social media? In this project will we analyze news articles and YouTube posts to determine what articles stand out.</p>

In [1]:
import pandas as pd 
import numpy as np 
import datetime as dt
import pymongo
import json

<h3>Import Clean Data & Create DataFrames</h3>

In [2]:
# Load YouTube data (clean) and Articles data (clean)
youtube_csv = ('Resources/YouTube_Clean.csv')
articles_csv = ('Resources/Articles_Clean.csv')

In [3]:
# Convert YouTube Data into DataFrame 
youtube_df = pd.read_csv(youtube_csv)
youtube_df.head()

Unnamed: 0,Category_ID,Category,Date,Video_Title,Channel,Tags,Total_Views,Total_Likes,Comments_Count
0,22,People & Blogs,2017-11-14,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,SHANtell martin,748374,57527,15954
1,22,People & Blogs,2017-11-14,Me-O Cats Commercial,Nobrand,"cute|""cats""|""thai""|""eggs""",98966,2486,532
2,22,People & Blogs,2017-11-14,"AFFAIRS, EX BOYFRIENDS, $18MILLION NET WORTH -...",Shawn Johnson East,"shawn johnson|""andrew east""|""shawn east""|""shaw...",321053,4451,895
3,22,People & Blogs,2017-11-14,BLIND(folded) CAKE DECORATING CONTEST (with Mo...,Grace Helbig,"itsgrace|""funny""|""comedy""|""vlog""|""grace""|""helb...",197062,7250,456
4,22,People & Blogs,2017-11-14,Wearing Online Dollar Store Makeup For A Week,Safiya Nygaard,wearing online dollar store makeup for a week|...,2744430,115426,6541


In [4]:
# Convert Articles Data into DataFrame 
articles_df = pd.read_csv(articles_csv)
articles_df.head()

Unnamed: 0,Article_ID,Headline,Keywords,News_Category,Pub_Date,Section_Name,Snippet,Media_Type,URL
0,5a7101c110f40f00018be961,"Rhythm of the Streets: ‘We’re Warrior Women, a...","['Bahia (Brazil)', 'Music', 'Women and Girls',...",Travel & Events,1/30/2018 23:37,Unknown,Meet the all-female Brazilian drum group that ...,News,https://www.nytimes.com/2018/01/30/travel/braz...
1,5a70fc1210f40f00018be950,"As Deficit Grows, Congress Keeps Spending","['United States Politics and Government', 'Fed...",News & Politics,1/30/2018 23:13,Politics,Treasury Secretary Steven Mnuchin urged Congre...,News,https://www.nytimes.com/2018/01/30/us/politics...
2,5a70f8f810f40f00018be943,Lesson in Select Bus Service,"['Buses', 'Pennsylvania Station (Manhattan, NY...",News & Politics,1/30/2018 23:00,Unknown,A woman finds out what happens when you don’t ...,News,https://www.nytimes.com/2018/01/30/nyregion/me...
3,5a70eb8110f40f00018be925,Here’s the Real State of the Union,"['State of the Union Message (US)', 'Trump, Do...",People & Blogs,1/30/2018 22:02,Editorials,The reaction against his authoritarian impulse...,Editorial,https://www.nytimes.com/2018/01/30/opinion/edi...
4,5a70d1d210f40f00018be8d9,Good Riddance to Chief Wahoo,"['Baseball', 'Cleveland Indians', 'Western Res...",People & Blogs,1/30/2018 20:13,Unknown,"I’ve lived in Cleveland all my life, and I’m g...",Op-Ed,https://www.nytimes.com/2018/01/30/opinion/chi...


In [5]:
articles_df.columns = ['Article_ID', 'Headline', 'Keywords', 'Category', 'Date',
       'Section_Name', 'Snippet', 'Media_Type', 'URL']
articles_df.head()

Unnamed: 0,Article_ID,Headline,Keywords,Category,Date,Section_Name,Snippet,Media_Type,URL
0,5a7101c110f40f00018be961,"Rhythm of the Streets: ‘We’re Warrior Women, a...","['Bahia (Brazil)', 'Music', 'Women and Girls',...",Travel & Events,1/30/2018 23:37,Unknown,Meet the all-female Brazilian drum group that ...,News,https://www.nytimes.com/2018/01/30/travel/braz...
1,5a70fc1210f40f00018be950,"As Deficit Grows, Congress Keeps Spending","['United States Politics and Government', 'Fed...",News & Politics,1/30/2018 23:13,Politics,Treasury Secretary Steven Mnuchin urged Congre...,News,https://www.nytimes.com/2018/01/30/us/politics...
2,5a70f8f810f40f00018be943,Lesson in Select Bus Service,"['Buses', 'Pennsylvania Station (Manhattan, NY...",News & Politics,1/30/2018 23:00,Unknown,A woman finds out what happens when you don’t ...,News,https://www.nytimes.com/2018/01/30/nyregion/me...
3,5a70eb8110f40f00018be925,Here’s the Real State of the Union,"['State of the Union Message (US)', 'Trump, Do...",People & Blogs,1/30/2018 22:02,Editorials,The reaction against his authoritarian impulse...,Editorial,https://www.nytimes.com/2018/01/30/opinion/edi...
4,5a70d1d210f40f00018be8d9,Good Riddance to Chief Wahoo,"['Baseball', 'Cleveland Indians', 'Western Res...",People & Blogs,1/30/2018 20:13,Unknown,"I’ve lived in Cleveland all my life, and I’m g...",Op-Ed,https://www.nytimes.com/2018/01/30/opinion/chi...


<h3>Analysis</h3>

In [6]:
# Convert Date column into date format and remove time
youtube_df['Date'] = pd.to_datetime(youtube_df['Date']).dt.date
youtube_df.head()

Unnamed: 0,Category_ID,Category,Date,Video_Title,Channel,Tags,Total_Views,Total_Likes,Comments_Count
0,22,People & Blogs,2017-11-14,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,SHANtell martin,748374,57527,15954
1,22,People & Blogs,2017-11-14,Me-O Cats Commercial,Nobrand,"cute|""cats""|""thai""|""eggs""",98966,2486,532
2,22,People & Blogs,2017-11-14,"AFFAIRS, EX BOYFRIENDS, $18MILLION NET WORTH -...",Shawn Johnson East,"shawn johnson|""andrew east""|""shawn east""|""shaw...",321053,4451,895
3,22,People & Blogs,2017-11-14,BLIND(folded) CAKE DECORATING CONTEST (with Mo...,Grace Helbig,"itsgrace|""funny""|""comedy""|""vlog""|""grace""|""helb...",197062,7250,456
4,22,People & Blogs,2017-11-14,Wearing Online Dollar Store Makeup For A Week,Safiya Nygaard,wearing online dollar store makeup for a week|...,2744430,115426,6541


In [7]:
# Pull Date Range 2018-01-05 to 2018-05-01 from YouTube Dataframe
youtube_date_df = youtube_df[(youtube_df['Date']>dt.date(2018,1,4)) & (youtube_df['Date']<dt.date(2018,5,1))]
youtube_date_df.head()

Unnamed: 0,Category_ID,Category,Date,Video_Title,Channel,Tags,Total_Views,Total_Likes,Comments_Count
845,22,People & Blogs,2018-01-05,Born with Shortened Limbs and One Eye (Geoff's...,Special Books by Special Kids,"Thalidomide|""Blind""|""Limbs""|""SBSK""|""Education""...",24843,2102,254
846,22,People & Blogs,2018-01-05,BRING IT IN 2018,vlogbrothers,"john green|""history""|""learning""|""education""|""v...",191854,12496,2082
847,22,People & Blogs,2018-01-05,The Link Between Japanese Samurai and Real Indigo,Great Big Story,"great big story|""gbs""|""lag""|""documentary""|""doc...",243258,7871,378
848,22,People & Blogs,2018-01-05,"Tipsy Talk with Anna Kendrick, Anna Camp and B...",Hazel Hayes,"hazel|""hazel hayes""|""chewingsand""|""chewing san...",141009,8851,333
849,22,People & Blogs,2018-01-05,What’s In Zooey In The City’s Bag | Spill It |...,Refinery29,"refinery29|""refinery 29""|""r29""|""r29 video""|""re...",11133,366,50


In [8]:
# Convert Date column into date format and remove time
articles_df['Date'] = pd.to_datetime(articles_df['Date']).dt.date
articles_df.tail()

Unnamed: 0,Article_ID,Headline,Keywords,Category,Date,Section_Name,Snippet,Media_Type,URL
4750,5ae82c93068401528a2ab969,This Common Question Reinforces the Gender Pay...,"['Discrimination', 'Wages and Salaries', 'Labo...",Science & Technology,2018-05-01,Unknown,Several states and cities have ordered employe...,News,https://www.nytimes.com/2018/05/01/upshot/how-...
4751,5ae82c95068401528a2ab96b,"Anna, Llama and Me","['Friendship', 'Dewdney, Anna', 'Writing and W...",Howto & Style,2018-05-01,Family,"The beginning, middle and end of a picture boo...",News,https://www.nytimes.com/2018/05/01/well/family...
4752,5ae82c9d068401528a2ab96d,Gen. Michael Hayden Has One Regret: Russia,"['Classified Information and State Secrets', '...",People & Blogs,2018-05-01,Unknown,"The former N.S.A. and C.I.A. chief on Trump, S...",News,https://www.nytimes.com/2018/05/01/magazine/ge...
4753,5ae82c9f068401528a2ab96f,There Is Nothin’ Like a Tune,"['Books and Literature', 'Purdum, Todd S', 'Th...",Entertainment,2018-05-01,Book Review,"In “Something Wonderful,” Todd S. Purdum analy...",Review,https://www.nytimes.com/2018/05/01/books/revie...
4754,5ae82ca3068401528a2ab97a,Unknown,"['Theater', 'Tony Awards (Theater Awards)', 'A...",Travel & Events,2018-05-01,Unknown,"A pair of two-part productions, “Harry Potter ...",News,https://www.nytimes.com/2018/05/01/theater/ton...


In [9]:
# Pull Date range 2018-01-05 to 2018-05-01 from Articles Dataframe 
articles_date_df = articles_df[(articles_df['Date']>dt.date(2018,1,1)) & (articles_df['Date']<dt.date(2018,1,5))]
articles_date_df.head()

Unnamed: 0,Article_ID,Headline,Keywords,Category,Date,Section_Name,Snippet,Media_Type,URL
760,5a4ebf357c459f29e79b25c4,The Lowdown on the Flaws in Computer Chips,"['Computer Chips', 'Computer Security', 'Cloud...",News & Politics,2018-01-04,Unknown,Hackers can exploit two major security flaws i...,News,https://www.nytimes.com/2018/01/04/technology/...
761,5a4eb8c17c459f29e79b25b9,Application of an Obama-Era Fair-Housing Rule ...,"['Housing and Urban Development Department', '...",Science & Technology,2018-01-04,Unknown,HUD will delay rollout of a measure requiring ...,News,https://www.nytimes.com/2018/01/04/upshot/trum...
872,5a4eb5047c459f29e79b25b4,Donors and Candidates Abandon Bannon After His...,"['Bannon, Stephen K', 'Republican Party', 'Mer...",News & Politics,2018-01-04,Politics,Stephen K. Bannon’s provocative remarks about ...,News,https://www.nytimes.com/2018/01/04/us/politics...
873,5a4eb1f67c459f29e79b25aa,Parachute Jump to Prexy’s,"['Amusement and Theme Parks', 'Coney Island (B...",News & Politics,2018-01-04,Unknown,"Dropping from the sky, and then heading for a ...",News,https://www.nytimes.com/2018/01/04/nyregion/me...
874,5a4ea68b7c459f29e79b258b,Trumpworld Knows He’s An Idiot,"['Trump, Donald J', 'United States Politics an...",People & Blogs,2018-01-04,Unknown,Michael Wolff’s new book shows the cynicism of...,Op-Ed,https://www.nytimes.com/2018/01/04/opinion/fir...


In [10]:
# merge youtube_date_df with articles_date_df for analysis
# change sufficxes to _youtube to 
merge_df = youtube_date_df.merge(articles_date_df, on='Category', suffixes=('_youtube', '_ny_times'))
merge_df.head()

Unnamed: 0,Category_ID,Category,Date_youtube,Video_Title,Channel,Tags,Total_Views,Total_Likes,Comments_Count,Article_ID,Headline,Keywords,Date_ny_times,Section_Name,Snippet,Media_Type,URL
0,22,People & Blogs,2018-01-05,Born with Shortened Limbs and One Eye (Geoff's...,Special Books by Special Kids,"Thalidomide|""Blind""|""Limbs""|""SBSK""|""Education""...",24843,2102,254,5a4ea68b7c459f29e79b258b,Trumpworld Knows He’s An Idiot,"['Trump, Donald J', 'United States Politics an...",2018-01-04,Unknown,Michael Wolff’s new book shows the cynicism of...,Op-Ed,https://www.nytimes.com/2018/01/04/opinion/fir...
1,22,People & Blogs,2018-01-05,Born with Shortened Limbs and One Eye (Geoff's...,Special Books by Special Kids,"Thalidomide|""Blind""|""Limbs""|""SBSK""|""Education""...",24843,2102,254,5a4e86a87c459f29e79b253f,Daughters Aren’t the Only Cause,"['Women and Girls', ""Women's Rights"", 'Sexual ...",2018-01-04,Sunday Review,"In 2017, women marched for their daughters. Th...",Op-Ed,https://www.nytimes.com/2018/01/04/opinion/sun...
2,22,People & Blogs,2018-01-05,Born with Shortened Limbs and One Eye (Geoff's...,Special Books by Special Kids,"Thalidomide|""Blind""|""Limbs""|""SBSK""|""Education""...",24843,2102,254,5a4e81e87c459f29e79b2533,"Trapped, and Freed, by the Ice","['Ice', 'Northeastern States (US)', 'Weather']",2018-01-04,Unknown,The forced isolation of a snowstorm is not all...,Op-Ed,https://www.nytimes.com/2018/01/04/opinion/bom...
3,22,People & Blogs,2018-01-05,Born with Shortened Limbs and One Eye (Geoff's...,Special Books by Special Kids,"Thalidomide|""Blind""|""Limbs""|""SBSK""|""Education""...",24843,2102,254,5a4e77e77c459f29e79b250d,Unknown,[],2018-01-04,Unknown,Readers respond.,News,https://www.nytimes.com/2018/01/04/magazine/th...
4,22,People & Blogs,2018-01-05,BRING IT IN 2018,vlogbrothers,"john green|""history""|""learning""|""education""|""v...",191854,12496,2082,5a4ea68b7c459f29e79b258b,Trumpworld Knows He’s An Idiot,"['Trump, Donald J', 'United States Politics an...",2018-01-04,Unknown,Michael Wolff’s new book shows the cynicism of...,Op-Ed,https://www.nytimes.com/2018/01/04/opinion/fir...


In [45]:
merge_df['Date_youtube'] = merge_df['Date_youtube'].astype(str)
merge_df['Date_ny_times'] = merge_df['Date_ny_times'].astype(str)

**PyMongo Database**

In [49]:
# connecting to mongo database
conn = 'mongodb://localhost:27017/Social_Media_DB'
client = pymongo.MongoClient(conn)
db = client.socialmedia_db
socialmedia = db.socialmedia.find()
client.db.socialmedia.insert_many(merge_df.to_dict(orient = 'records'))

<pymongo.results.InsertManyResult at 0x25e6a02a948>