# News Article Analysis
The proliferation of the Internet has changed the way we consume content. For example, when we read news, we mainly read it from online. With the amount of news articles that are on the press each and every day, it would be useful to come up with a "smart summary" of what is going on each and every day. One basic way of doing this in an automated standpoint is analyzing the frequency of certain key words throughout the week.

In this exercise, you will leverage Amazon Web Services (AWS) to build an automated dashboard that a user can interact with in order to identify top keywords among CNBC articles for each day and compare their frequencies with that of the past week. Our dashboard shall allow the user to recommend the top keywords for the day, choose their preferred target keywords, and compare it with previous days.


## Why are we doing this? This doesn't relate to data science?

I totally understand where you're coming from. While this exercise does not involve training Machine Learning and/or Deep Learning Models, part of being a good data scientist involves you picking up on new tools and technologies and being able to present your findings in an applicable way. Data science is less about the tools you use and more on the story you can tell. This exercise will allow you to get acclimated with learning about AWS (which might be new to some of you and that's totally okay) while also learning about how to present your findings in a user-friendly way.

## Basic Setup
Go to the [AWS Website](https://aws.amazon.com/) and create an account. You will need this for the project

## Plan of Attack
So, this problem might seem understandable, but a bit daunting from a technical perspective. Let's come up with a potential/rough plan of attack (if you find other approaches, feel free to try them out):


### Step 1: Scraping the CNBC News Articles
Before even constructing the dashboard, we need a way of scraping the CNBC news articles in the first place. That would warrant developing a function we can invoke in `cnbc_scraper.py` file that would scrape the news articles for the past week and give that data in the form of, say, a CSV file. 

### Step 2: Automate the Scraping of the CNBC News Articles
Once you get to this step, you know that your script to scrape the CNBC news articles is giving you data in a form such as CSV. However, we want to be able to run this scraper once each day. That can be automated via AWS. I would recommend looking into an AWS service called AWS Lambda. AWS Lambda is a serverless AWS service that can help you run "relatively lightweight" tasks at scale in an automated fashion. In our case, scraping CNBC articles published today is something that isn't too resource-intensive (after all, how many articles on CNN are published per day across all topics). AWS Lambda currently has a 15 minute time limit and a memory limit (though you shouldn't worry about that with the size of data you are working with right now). To set up an AWS lambda, you can go to your AWS account you created and under "Services", search for "Lambda."

### Step 3: Setup the Dashboard
Okay, now for the storytelling. Once you have confirmed that the Lambda works, you will need to build a dashboard in AWS (check out the options out there such as Grafana) that will take the CSV file from your Lambda endpoint and populate the top $K$ word counts for *one specific date*. You can play around with any preprocessing of the scraped CNBC articles before getting the word counts if you'd like.

Note that the user should have the ability to select a value of $K$ (one that is positive) and your dashboard should show the word associated with its count. The user should also have the ability to choose their preferred word choices. For some people, "Trump" could be more interesting than "Bitcoin", for example and we want to allow that flexibiliy.

Finally, repeat for the other 6 days. Make a graph or a bar chart that visualizes the changing frequencies of the words over the days. Then, automate for the upcoming days.



In [4]:
# Test your cnbc_scraper function here!

In [None]:
# Test that you can invoke your AWS Lambda function (boto3 would help here!)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=a95f8e84-27c6-4e31-8773-43ee6cc69ab6' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>