# Data Acquisition Exercises

By the end of this exercise, you should have a file named acquire.py that contains the specified functions. If you wish, you may break your work into separate files for each website (e.g. acquire_codeup_blog.py and acquire_news_articles.py), but the end function should be present in acquire.py (that is, acquire.py should import get_blog_articles from the acquire_codeup_blog module.)

In [1]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import mason_functions as mf
import requests

### Soup Methods
* soup.select('.class'): to get all the elements with class class
* soup.select_one('.class'): to get the first element with class class
* soup.h2: to get the first h2 element
* soup.find_all('h2'): to get all the elements with tag name of h2
* soup('h2') : same as find_all method above
* soup.find('h2'): finds the first matching element

## Exercise I Codeup Blog Articles

Visit Codeup's Blog and record the urls for at least 5 distinct blog posts. For each post, you should scrape at least the post's title and content.

Encapsulate your work in a function named get_blog_articles that will return a list of dictionaries, with each dictionary representing one article. The shape of each dictionary should look like this:


{
    'title': 'the title of the article',
    'content': 'the full text content of the article'
}

Plus any additional properties you think might be helpful.

#### Bonus: Scrape the text of all the articles linked on codeup's blog page.

In [2]:
# define url to scrape
url = 'https://codeup.com/blog/'

# define headers
headers = {'User-Agent': 'Codeup Data Server'}

# get a response from the server
response = requests.get(url, headers = headers)

# assign variable to text requested
html = response.text

# view html text
# html

In [3]:
# convert html into a good soup
soup = BeautifulSoup(html)

# print beautiful soup
# print(soup.prettify())

In [4]:
# what's here?
h2s = soup.find('h2').text
h2s

'VET TEC Funding Now Available For Dallas Veterans'

In [5]:
# select .entry-title elements
entry_titles = soup.select('.entry-title')
entry_titles

[<h2 class="entry-title"><a href="https://codeup.com/codeup-news/codeup-start-dates-for-march-2022/">Codeup Start Dates for March 2022</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/codeup-news/vet-tec-funding-dallas/">VET TEC Funding Now Available For Dallas Veterans</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/codeup-news/dallas-campus-re-opens-with-new-grant-partner/">Dallas Campus Re-opens With New Grant Partner</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/dallas-newsletter/codeup-dallas-open-house/">Codeup Dallas Open House</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/codeup-news/codeups-placement-team-continues-setting-records/">Codeup’s Placement Team Continues Setting Records</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/it-training/it-certifications-101/">IT Certifications 101: Why They Matter, and Why They Don’t</a></h2>,
 <h2 class="entry-title"><a href="https://codeup.com/cybersecurity/a-

In [6]:
# get string for title
soup.title.string

'Blog - Codeup'

In [7]:
# understand links
link = entry_titles[0].a['href']
link

'https://codeup.com/codeup-news/codeup-start-dates-for-march-2022/'

In [8]:
# understand titles
title = entry_titles[0].a.text
title

'Codeup Start Dates for March 2022'

In [9]:
# try first link
url = entry_titles[0].a['href']

# get response with requests library
response = requests.get(url, headers = headers)

# get html
html = response.text

# make good soup
soup = BeautifulSoup(html)

# beautiful soup
# print(soup.prettify())

In [10]:
# get entry content
corpus = soup.select('.entry-content')
corpus

[<div class="entry-content">
 <p>As we approach the end of January we wanted to look forward to our next start dates for all of our current programs.</p>
 <h3>Full Stack Web Development – 3/7/22</h3>
 <p>Full Stack Web Development is the first program we built and also our most popular. You’ve asked and we listened! Our next Web Development cohort will start on 3/7/2022 and is ENTIRELY VIRTUAL! THESE SEATS WILL GO FAST!</p>
 <p>As one of the most in-demand jobs in the country, software and web development is the tech career with the newest jobs. In the U.S., there’s:</p>
 <ul>
 <li>1.5 million developer jobs*</li>
 <li>250,000 of them remain open</li>
 <li>a high growth rate of 13%*</li>
 </ul>
 <p> </p>
 <h3>Data Science – 3/22/22</h3>
 <p>Our first new Data Science class of 2022 starts Monday 3/22/2022 at our downtown campus at the Vogue building.</p>
 <p>Why consider pivoting careers to Data Science?</p>
 <ul>
 <li>#1 job in America from 2016-2020 (Glassdoor*)</li>
 <li>650% increas

In [11]:
# get content
content = corpus[0].text.strip()
content

'As we approach the end of January we wanted to look forward to our next start dates for all of our current programs.\nFull Stack Web Development – 3/7/22\nFull Stack Web Development is the first program we built and also our most popular. You’ve asked and we listened! Our next Web Development cohort will start on 3/7/2022 and is ENTIRELY VIRTUAL! THESE SEATS WILL GO FAST!\nAs one of the most in-demand jobs in the country, software and web development is the tech career with the newest jobs. In the U.S., there’s:\n\n1.5 million developer jobs*\n250,000 of them remain open\na high growth rate of 13%*\n\n\xa0\nData Science – 3/22/22\nOur first new Data Science class of 2022 starts Monday 3/22/2022 at our downtown campus at the Vogue building.\nWhy consider pivoting careers to Data Science?\n\n#1 job in America from 2016-2020 (Glassdoor*)\n650% increase in data science positions since 2012\nNearly 12 million new jobs between 2019 and 2029\n31% ten-year growth rate\n\nThe supply of data sc

In [12]:
# get publish date
publish_date = soup.select('.published')
publish_date

[<span class="published">Jan 26, 2022</span>]

In [13]:
# get date
date = publish_date[0].text
date

'Jan 26, 2022'

In [14]:
def get_blog_articles():
    
    '''
    This function acquires the blog articles from the Codeup Blog
    '''
    
    url = 'https://codeup.com/blog/'    # define url to scrape
    headers = {'User-Agent': 'Codeup Data Server'}    # define headers
    response = requests.get(url, headers = headers)    # get a response from the server
    html = response.text    # assign variable to text requested
    soup = BeautifulSoup(html)    # convert html into a good soup
    
    entry_titles = soup.select('.entry-title')    # select .entry-title elements
    output = []    # set an empty list
    
    for n in range(len(entry_titles)):    # commence loop to run through list of .entry-title elements
        link = entry_titles[n].a['href']    # get link
        title = entry_titles[n].a.text    # get title
        response = requests.get(link, headers = headers)    # get response with requests library
        html = response.text    # get html
        soup = BeautifulSoup(html)    # water the soup
        corpus = soup.select('.entry-content')    # get the corpus
        content = corpus[0].text.strip()    # get the content of the corpus
        publish_date = soup.select('.published')    # get publish date
        date = publish_date[0].text    # get date
    
        article_info = {               # create dictionary
            'publish_date': date,
            'article': title,
            'content': content,
            'link': link
        }
        
        output.append(article_info)    # add dictionary to the list
        
    blogs = pd.DataFrame(output)    # create dataframe from accumulated info
    
    return blogs

In [15]:
blogs = get_blog_articles()
blogs

Unnamed: 0,publish_date,article,content,link
0,"Jan 26, 2022",Codeup Start Dates for March 2022,As we approach the end of January we wanted to...,https://codeup.com/codeup-news/codeup-start-da...
1,"Jan 7, 2022",VET TEC Funding Now Available For Dallas Veterans,We are so happy to announce that VET TEC benef...,https://codeup.com/codeup-news/vet-tec-funding...
2,"Dec 30, 2021",Dallas Campus Re-opens With New Grant Partner,We are happy to announce that our Dallas campu...,https://codeup.com/codeup-news/dallas-campus-r...
3,"Nov 30, 2021",Codeup Dallas Open House,Come join us for the re-opening of our Dallas ...,https://codeup.com/dallas-newsletter/codeup-da...
4,"Nov 19, 2021",Codeup’s Placement Team Continues Setting Records,Our Placement Team is simply defined as a grou...,https://codeup.com/codeup-news/codeups-placeme...
5,"Nov 18, 2021","IT Certifications 101: Why They Matter, and Wh...","AWS, Google, Azure, Red Hat, CompTIA…these are...",https://codeup.com/it-training/it-certificatio...
6,"Nov 17, 2021",A rise in cyber attacks means opportunities fo...,"In the last few months, the US has experienced...",https://codeup.com/cybersecurity/a-rise-in-cyb...
7,"Nov 4, 2021",Use your GI Bill® benefits to Land a Job in Tech,"As the end of military service gets closer, ma...",https://codeup.com/codeup-news/use-your-gi-bil...
8,"Oct 28, 2021",Which program is right for me: Cyber Security ...,What IT Career should I choose?\nIf you’re thi...,https://codeup.com/tips-for-prospective-studen...
9,"Oct 21, 2021",What the Heck is System Engineering?,Codeup offers a 13-week training program: Syst...,https://codeup.com/it-training/what-the-heck-i...


In [16]:
# view content from 'Codeup Start Dates for March 22' or the [0] indice
blogs.content[0]

'As we approach the end of January we wanted to look forward to our next start dates for all of our current programs.\nFull Stack Web Development – 3/7/22\nFull Stack Web Development is the first program we built and also our most popular. You’ve asked and we listened! Our next Web Development cohort will start on 3/7/2022 and is ENTIRELY VIRTUAL! THESE SEATS WILL GO FAST!\nAs one of the most in-demand jobs in the country, software and web development is the tech career with the newest jobs. In the U.S., there’s:\n\n1.5 million developer jobs*\n250,000 of them remain open\na high growth rate of 13%*\n\n\xa0\nData Science – 3/22/22\nOur first new Data Science class of 2022 starts Monday 3/22/2022 at our downtown campus at the Vogue building.\nWhy consider pivoting careers to Data Science?\n\n#1 job in America from 2016-2020 (Glassdoor*)\n650% increase in data science positions since 2012\nNearly 12 million new jobs between 2019 and 2029\n31% ten-year growth rate\n\nThe supply of data sc

In [17]:
# import time

# # Save the blogs as json:
# today = time.strftime('%Y-%m-%d')
# blogs.to_json(f'codeup_blog_{today}.json')

## Exercise II News Articles

We will now be scraping text data from inshorts, a website that provides a brief overview of many different topics.

Write a function that scrapes the news articles for the following topics:

* Business
* Sports
* Technology
* Entertainment

The end product of this should be a function named get_news_articles that returns a list of dictionaries, where each dictionary has this shape:


{
    'title': 'The article title',
    'content': 'The article content',
    'category': 'business' # for example
}

Hints:

a. Start by inspecting the website in your browser. Figure out which elements will be useful.

b. Start by creating a function that handles a single article and produces a dictionary like the one above.

c. Next create a function that will find all the articles on a single page and call the function you created in the last step for every article on the page.

d. Now create a function that will use the previous two functions to scrape the articles from all the pages that you need, and do any additional processing that needs to be done.

In [18]:
url = 'https://inshorts.com/en/read/business'    # define url
headers = {'User-Agent': 'Codeup Data Server'}     # define headers
response = requests.get(url, headers = headers)    # get a response from the server
html = response.text    # assign variable to text requested
soup = BeautifulSoup(html)    # convert html into a good soup
# print(soup.prettify())    # beautiful soup

In [19]:
cards = soup.select('.news-card')
cards[0]

<div class="news-card z-depth-1" itemscope="" itemtype="http://schema.org/NewsArticle">
<span content="" itemid="https://inshorts.com/en/news/rbi-cancels-licence-of-mahabased-independence-cooperative-bank-1643895020128" itemprop="mainEntityOfPage" itemscope="" itemtype="https://schema.org/WebPage"></span>
<span itemprop="author" itemscope="itemscope" itemtype="https://schema.org/Person">
<span content="Shalini Ojha" itemprop="name"></span>
</span>
<span content="RBI cancels licence of Maha-based Independence Co-operative Bank" itemprop="description"></span>
<span itemprop="image" itemscope="" itemtype="https://schema.org/ImageObject">
<meta content="https://static.inshorts.com/inshorts/images/v1/variants/jpg/m/2022/02_feb/3_thu/img_1643894076621_61.jpg?" itemprop="url"/>
<meta content="864" itemprop="width"/>
<meta content="483" itemprop="height"/>
</span>
<span itemprop="publisher" itemscope="itemscope" itemtype="https://schema.org/Organization">
<span content="https://inshorts.com/" 

In [20]:
# headline
headline = cards[0].find('span', itemprop = 'headline').text
headline

'RBI cancels licence of Maha-based Independence Co-operative Bank'

In [21]:
# corpus
corpus = cards[0].find('div', itemprop = 'articleBody').text
corpus

"RBI has cancelled licence of Maharashtra-based Independence Co-operative Bank, citing inadequate capital. It will cease to carry on banking operations from the close of business on February 3. In the present situation, the bank won't be able to pay its depositors in full, RBI said. It added that the bank didn't comply with multiple sections of Banking Regulation Act, 1949. "

In [22]:
# publish date
date = cards[0].find('span', clas = 'date').text
date

'03 Feb 2022,Thursday'

In [23]:
# author
author = cards[0].find('span', class_ = 'author').text
author

'Shalini Ojha'

In [24]:
# source link
source = cards[0].find('a', class_ = 'source')['href'].strip()
source

'https://www.moneycontrol.com/news/business/rbi-cancels-licence-of-maharashtra-based-independence-co-operative-bank-8034241.html/amp?utm_campaign=fullarticle&utm_medium=referral&utm_source=inshorts'

In [25]:
def get_news_articles(category):
    
    '''
    This function acquires the news articles from the inshorts website for a specificied category of news
    '''
    
    url = f'https://inshorts.com/en/read/{category}'    # define url
    headers = {'User-Agent': 'Codeup Data Server'}     # define headers
    response = requests.get(url, headers = headers)    # get a response from the server
    html = response.text    # assign variable to text requested
    soup = BeautifulSoup(html)    # convert html
    
    output = []    # set an empty list to fill with dictionaries
    
    cards = soup.select('.news-card')    # select all news-cards
    
    # commence loop
    for n in range(len(cards)):
        headline = cards[n].find('span', itemprop = 'headline').text    # get headline
        corpus = cards[n].find('div', itemprop = 'articleBody').text    # get corpus
        date = cards[n].find('span', clas = 'date').text    # get date
        author = cards[n].find('span', class_ = 'author').text    # get author
        source = cards[n].find('a', class_ = 'source')['href'].strip()    # get source link
    
        article_info = {'publish_date': date,    # create dictionary from gathered values
                        'article': headline,
                        'content': corpus,
                        'author': author,
                        'source': source
                       }
        
        output.append(article_info)    # add dictionaries to list
    
    inshorts = pd.DataFrame(output)    # plug dataframe
    
    return inshorts


def get_news(category):
    
    '''
    This function acquires the news articles from the inshorts website for a specificied category of news
    '''
    
    url = f'https://inshorts.com/en/read/{category}'    # define url
    headers = {'User-Agent': 'Codeup Data Server'}     # define headers
    response = requests.get(url, headers = headers)    # get a response from the server
    html = response.text    # assign variable to text requested
    soup = BeautifulSoup(html)    # convert html
    
    output = []    # set an empty list to fill with dictionaries
    
    cards = soup.select('.news-card')    # select all news-cards
    
    # commence loop
    for n in range(len(cards)):
        headline = cards[n].find('span', itemprop = 'headline').text    # get headline
        corpus = cards[n].find('div', itemprop = 'articleBody').text    # get corpus
        date = cards[n].find('span', clas = 'date').text    # get date
        author = cards[n].find('span', class_ = 'author').text    # get author
    
        article_info = {'category': category,
                        'publish_date': date,    # create dictionary from gathered values
                        'article': headline,
                        'content': corpus,
                        'author': author,
                       }
        
        output.append(article_info)    # add dictionaries to list
    
    inshorts = pd.DataFrame(output)    # plug dataframe
    
    return inshorts

In [26]:
business = get_news_articles('business')
business

Unnamed: 0,publish_date,article,content,author,source
0,"03 Feb 2022,Thursday",RBI cancels licence of Maha-based Independence...,RBI has cancelled licence of Maharashtra-based...,Shalini Ojha,https://www.moneycontrol.com/news/business/rbi...
1,"04 Feb 2022,Friday",This is an infrastructure and growth-focused B...,Capex outlay has been increased and private ca...,Roshan Gupta,https://www.smallcase.com/smallcase/infra-trac...
2,"04 Feb 2022,Friday","Self-taught beautician to micro-entrepreneur, ...",The latest episode of Urban Company Impact int...,Roshan Gupta,https://bit.ly/3GuNrVA
3,"03 Feb 2022,Thursday",Facebook parent Meta's $230-billion wipeout bi...,Facebook's parent Meta's shares plunged 27% an...,Pragya Swastik,https://www.bloombergquint.com/business/facebo...
4,"03 Feb 2022,Thursday",Why did Facebook's parent company Meta lose $2...,Facebook parent Meta's shares fell over 20% ea...,Pragya Swastik,https://www.businessinsider.in/stock-market/ne...
5,"04 Feb 2022,Friday","Ambani, Adani become richer than Zuckerberg af...",Reliance Industries Chairman Mukesh Ambani and...,Pragya Swastik,https://www.moneycontrol.com/news/business/met...
6,"03 Feb 2022,Thursday",Mark Zuckerberg loses $31 bn in one of the big...,Meta CEO Mark Zuckerberg's wealth dropped by $...,Pragya Swastik,https://www.bloombergquint.com/business/mark-z...
7,"04 Feb 2022,Friday",Facebook facing unprecedented level of competi...,"At a virtual meeting, Meta CEO Mark Zuckerberg...",Kiran Khatri,https://www.bloombergquint.com/business/zucker...
8,"04 Feb 2022,Friday",Facebook parent Meta's rating cut by JPMorgan ...,JPMorgan Chase analyst Doug Anmuth has downgra...,Kiran Khatri,https://www.businessinsider.in/stock-market/ne...
9,"04 Feb 2022,Friday","If board thinks I shouldn't be MD, they can gi...",When asked whether he considered resigning ami...,Kiran Khatri,https://www.moneycontrol.com/news/business/sta...


In [27]:
url = 'https://inshorts.com/en/read/sports'    # define url
headers = {'User-Agent': 'Codeup Data Server'}     # define headers
response = requests.get(url, headers = headers)    # get a response from the server
html = response.text    # assign variable to text requested
soup = BeautifulSoup(html)    # convert html into a good soup
# print(soup.prettify())    # beautiful soup

In [28]:
# source link
source = cards[1].find('a', class_ = 'source')['href'].strip()
source

'https://www.smallcase.com/smallcase/infra-tracker-SCTR_0005?&utm_source=inshorts-budget&utm_medium=dm-branding-iii-nativearticle2&utm_campaign=web-statics-infra'

In [29]:
get_news_articles('sports')

TypeError: 'NoneType' object is not subscriptable

In [30]:
# test function without source link code
get_news('sports')

Unnamed: 0,category,publish_date,article,content,author
0,sports,"03 Feb 2022,Thursday",I am like a kid who wants to play under MS Dho...,On being asked if he has any preferred team in...,Anmol Sharma
1,sports,"03 Feb 2022,Thursday",This auction I am going to just sit back and s...,"Mayank Agarwal, who was retained by Punjab Kin...",Anmol Sharma
2,sports,"04 Feb 2022,Friday",Footballer dies after collapsing on pitch due ...,A Greek footballer died after collapsing on th...,Ankush Verma
3,sports,"03 Feb 2022,Thursday",Dhawan shares pic after testing COVID-19 posit...,Team India opener Shikhar Dhawan took to Insta...,Anmol Sharma
4,sports,"03 Feb 2022,Thursday",Vettori told me 'Run towards Kohli not batters...,"Recalling the time when he joined RCB, Yuzvend...",Anmol Sharma
5,sports,"03 Feb 2022,Thursday","DD Sports to not live telecast opening, closin...",Prasar Bharati's CEO Shashi Shekhar Vempati tw...,Pragya Swastik
6,sports,"04 Feb 2022,Friday",Kohli virtually interacts with India U-19 cric...,Former Team India captain Virat Kohli virtuall...,Anmol Sharma
7,sports,"04 Feb 2022,Friday",Pakistan fast bowler Hasnain's action found il...,Pakistan fast bowler Mohammad Hasnain has been...,Anmol Sharma
8,sports,"03 Feb 2022,Thursday",Fan wears jerseys of 7 countries at the same t...,A video went viral showing a fan wearing jerse...,Anmol Sharma
9,sports,"03 Feb 2022,Thursday",Sachin congratulates Neeraj Chopra for 2022 La...,Sachin Tendulkar took to Twitter to congratula...,Anmol Sharma


In [31]:
def get_inshorts():
    
    '''
    This function loops through the aforementioned categories and scrapes news article info from these pages
    '''
    
    categories = ['business', 'sports', 'technology', 'entertainment']    # set list of desired categories
    inshorts = pd.DataFrame()    # set empty frame
    for cat in categories:    # commence loop
        df = get_news(cat)    # render dataframe from news article data
        inshorts = pd.concat([inshorts, df])    # concatenate dataframes
    
    return inshorts

In [32]:
# test function
inshorts = get_inshorts()
inshorts

Unnamed: 0,category,publish_date,article,content,author
0,business,"03 Feb 2022,Thursday",RBI cancels licence of Maha-based Independence...,RBI has cancelled licence of Maharashtra-based...,Shalini Ojha
1,business,"04 Feb 2022,Friday",This is an infrastructure and growth-focused B...,Capex outlay has been increased and private ca...,Roshan Gupta
2,business,"04 Feb 2022,Friday","Self-taught beautician to micro-entrepreneur, ...",The latest episode of Urban Company Impact int...,Roshan Gupta
3,business,"03 Feb 2022,Thursday",Facebook parent Meta's $230-billion wipeout bi...,Facebook's parent Meta's shares plunged 27% an...,Pragya Swastik
4,business,"03 Feb 2022,Thursday",Why did Facebook's parent company Meta lose $2...,Facebook parent Meta's shares fell over 20% ea...,Pragya Swastik
...,...,...,...,...,...
19,entertainment,"04 Feb 2022,Friday",'365 Days' star Michele to make Indian debut w...,"Italian actor Michele Morrone, who starred in ...",Udit Gupta
20,entertainment,"04 Feb 2022,Friday",Kangana has been a very supportive & endearing...,"Nawazuddin Siddiqui, who has wrapped up Kangan...",Udit Gupta
21,entertainment,"04 Feb 2022,Friday",Rejected H'wood projects where Indians weren't...,"Nitu Chandra, who made her Hollywood debut wit...",Udit Gupta
22,entertainment,"04 Feb 2022,Friday",Waheeda Rehman stood barefoot in temple set at...,Rakeysh Omprakash Mehra recalled shooting for ...,Udit Gupta


### Bonus: cache the data

Write your code such that the acquired data is saved locally in some form or fashion. Your functions that retrieve the data should prefer to read the local data instead of having to make all the requests everytime the function is called. Include a boolean flag in the functions to allow the data to be acquired "fresh" from the actual sources (re-writing your local cache).

In [33]:
# inshorts.to_csv('inshorts_nadir.csv')

In [34]:
# verify data cached appropriately
inshorts = pd.read_csv('inshorts_nadir.csv', index_col = 0)
inshorts

Unnamed: 0,category,publish_date,article,content,author
0,business,"03 Feb 2022,Thursday",RBI cancels licence of Maha-based Independence...,RBI has cancelled licence of Maharashtra-based...,Shalini Ojha
1,business,"04 Feb 2022,Friday",This is an infrastructure and growth-focused B...,Capex outlay has been increased and private ca...,Roshan Gupta
2,business,"04 Feb 2022,Friday","Self-taught beautician to micro-entrepreneur, ...",The latest episode of Urban Company Impact int...,Roshan Gupta
3,business,"03 Feb 2022,Thursday",Facebook parent Meta's $230-billion wipeout bi...,Facebook's parent Meta's shares plunged 27% an...,Pragya Swastik
4,business,"04 Feb 2022,Friday",Facebook facing unprecedented level of competi...,"At a virtual meeting, Meta CEO Mark Zuckerberg...",Kiran Khatri
...,...,...,...,...,...
19,entertainment,"03 Feb 2022,Thursday",I don't work thinking I'm so many films old: D...,"Actress Deepika Padukone, who made her Bollywo...",Kriti Kambiri
20,entertainment,"04 Feb 2022,Friday",Ajay Devgn's first look from Alia-starrer 'Gan...,Ajay Devgn has shared the first look of his ch...,Udit Gupta
21,entertainment,"04 Feb 2022,Friday",Rejected H'wood projects where Indians weren't...,"Nitu Chandra, who made her Hollywood debut wit...",Udit Gupta
22,entertainment,"04 Feb 2022,Friday","Emraan shares son's pic on his b'day, says 'Li...",Actor Emraan Hashmi took to Twitter on Thursda...,Udit Gupta


In [35]:
# verify data cached appropriately
blogs = pd.read_json('codeup_blog_2022-02-04.json')
blogs

Unnamed: 0,publish_date,article,content,link
0,"Jan 26, 2022",Codeup Start Dates for March 2022,As we approach the end of January we wanted to...,https://codeup.com/codeup-news/codeup-start-da...
1,"Jan 7, 2022",VET TEC Funding Now Available For Dallas Veterans,We are so happy to announce that VET TEC benef...,https://codeup.com/codeup-news/vet-tec-funding...
2,"Dec 30, 2021",Dallas Campus Re-opens With New Grant Partner,We are happy to announce that our Dallas campu...,https://codeup.com/codeup-news/dallas-campus-r...
3,"Nov 30, 2021",Codeup Dallas Open House,Come join us for the re-opening of our Dallas ...,https://codeup.com/dallas-newsletter/codeup-da...
4,"Nov 19, 2021",Codeup’s Placement Team Continues Setting Records,Our Placement Team is simply defined as a grou...,https://codeup.com/codeup-news/codeups-placeme...
5,"Nov 18, 2021","IT Certifications 101: Why They Matter, and Wh...","AWS, Google, Azure, Red Hat, CompTIA…these are...",https://codeup.com/it-training/it-certificatio...
6,"Nov 17, 2021",A rise in cyber attacks means opportunities fo...,"In the last few months, the US has experienced...",https://codeup.com/cybersecurity/a-rise-in-cyb...
7,"Nov 4, 2021",Use your GI Bill® benefits to Land a Job in Tech,"As the end of military service gets closer, ma...",https://codeup.com/codeup-news/use-your-gi-bil...
8,"Oct 28, 2021",Which program is right for me: Cyber Security ...,What IT Career should I choose?\nIf you’re thi...,https://codeup.com/tips-for-prospective-studen...
9,"Oct 21, 2021",What the Heck is System Engineering?,Codeup offers a 13-week training program: Syst...,https://codeup.com/it-training/what-the-heck-i...
