# Clean & Analyze Social Media

## Introduction

Social media has become a ubiquitous part of modern life, with platforms such as Instagram, Twitter, and Facebook serving as essential communication channels. Social media data sets are vast and complex, making analysis a challenging task for businesses and researchers alike. In this project, we explore a simulated social media, for example Tweets, data set to understand trends in likes across different categories.

## Prerequisites

To follow along with this project, you should have a basic understanding of Python programming and data analysis concepts. In addition, you may want to use the following packages in your Python environment:

- pandas
- Matplotlib
- ...

These packages should already be installed in Coursera's Jupyter Notebook environment, however if you'd like to install additional packages that are not included in this environment or are working off platform you can install additional packages using `!pip install packagename` within a notebook cell such as:

- `!pip install pandas`
- `!pip install matplotlib`

## Project Scope

The objective of this project is to analyze tweets (or other social media data) and gain insights into user engagement. We will explore the data set using visualization techniques to understand the distribution of likes across different categories. Finally, we will analyze the data to draw conclusions about the most popular categories and the overall engagement on the platform.

## Step 1: Importing Required Libraries

As the name suggests, the first step is to import all the necessary libraries that will be used in the project. In this case, we need pandas, numpy, matplotlib, seaborn, and random libraries.

Pandas is a library used for data manipulation and analysis. Numpy is a library used for numerical computations. Matplotlib is a library used for data visualization. Seaborn is a library used for statistical data visualization. Random is a library used to generate random numbers.

In [1]:
#import the relevant libraries
from bs4 import BeautifulSoup 
import requests 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
import json
import folium
from folium.features import DivIcon
from folium.plugins import MarkerCluster
from folium.plugins import MousePosition
from folium import Map, LayerControl
from folium.raster_layers import TileLayer
from folium.plugins import FloatImage

<h2> Setting up the requests from the National Weather Service </h2>

In [2]:
#Lists of county codes and their corresponding name
list_of_codes = ['NCC001','NCC003', 'NCC005', 'NCC007', 'NCC009', 'NCC011', 'NCC013', 'NCC015', 'NCC017', 'NCC019', 'NCC021', 'NCC023', 'NCC025', 'NCC027', 'NCC029', 'NCC031', 
'NCC033', 'NCC035', 'NCC037', 'NCC039', 'NCC041', 'NCC043', 'NCC045', 'NCC047', 'NCC049', 'NCC051', 'NCC053', 'NCC055', 'NCC057', 'NCC059', 'NCC061', 'NCC063', 'NCC065', 'NCC067', 
'NCC069', 'NCC071', 'NCC073', 'NCC075', 'NCC077', 'NCC079', 'NCC081', 'NCC083', 'NCC085', 'NCC087', 'NCC089', 'NCC091', 'NCC093', 'NCC095', 'NCC097', 'NCC099', 'NCC101', 'NCC103', 
'NCC105', 'NCC107', 'NCC109', 'NCC113', 'NCC115', 'NCC117', 'NCC111', 'NCC119', 'NCC121', 'NCC123', 'NCC125', 'NCC127', 'NCC129', 'NCC131', 'NCC133', 'NCC135', 'NCC137', 'NCC139', 
'NCC141', 'NCC143', 'NCC145', 'NCC147', 'NCC149', 'NCC151', 'NCC153', 'NCC155', 'NCC157', 'NCC159', 'NCC161', 'NCC163', 'NCC165', 'NCC167', 'NCC169', 'NCC171', 'NCC173', 'NCC175', 
'NCC177', 'NCC179', 'NCC181', 'NCC183', 'NCC185', 'NCC187', 'NCC189', 'NCC191', 'NCC193', 'NCC195', 'NCC197', 'NCC199']
dictionary_of_counties = {'NCC001': 'Alamance',"NCC003" : "Alexander", 'NCC005' : 'Alleghany', 'NCC007' : 'Anson', 'NCC009' : 'Ashe', 'NCC011' : 'Avery', 
'NCC013' : 'Beaufort', 'NCC015' : 'Bertie', 'NCC017' : 'Bladen', 'NCC019' : 'Brunswick', 'NCC021' : 'Buncombe', 'NCC023' : 'Burke', 'NCC025' : 'Cabarrus', 
'NCC027' : 'Caldwell', 'NCC029' : 'Camden', 'NCC031' : 'Carteret', 'NCC033' : 'Caswell', 'NCC035' : 'Catawba', 'NCC037' : 'Chatham', 'NCC039' : 'Cherokee', 
'NCC041' : 'Chowan', 'NCC043' : 'Clay', 'NCC045' : 'Cleveland', 'NCC047' : 'Columbus', 'NCC049' : 'Craven', 'NCC051' : 'Cumberland', 'NCC053' : 'Currituck', 
'NCC055' : 'Dare', 'NCC057' : 'Davidson', 'NCC059' : 'Davie', 'NCC061' : 'Duplin', 'NCC063' : 'Durham', 'NCC065' : 'Edgecombe', 'NCC067' : 'Forsyth', 
'NCC069' : 'Franklin', 'NCC071' : 'Gaston', 'NCC073' : 'Gates', 'NCC075' : 'Graham', 'NCC077' : 'Granville', 'NCC079' : 'Greene', 'NCC081' : 'Guilford', 'NCC083' : 'Halifax', 
'NCC085' : 'Harnett', 'NCC087' : 'Haywood', 'NCC089' : 'Henderson', 'NCC091' : 'Hertford', 'NCC093' : 'Hoke', 'NCC095' : 'Hyde', 'NCC097' : 'Iredell', 'NCC099' : 'Jackson', 
'NCC101' : 'Johnston', 'NCC103' : 'Jones', 'NCC105' : 'Lee', 'NCC107' : 'Lenoir', 'NCC109' : 'Lincoln', 'NCC113' : 'Macon', 'NCC115' : 'Madison', 'NCC117' : 'Martin', 
'NCC111' : 'McDowell', 'NCC119' : 'Mecklenburg', 'NCC121' : 'Mitchell', 'NCC123' : 'Montgomery', 'NCC125' : 'Moore', 'NCC127' : 'Nash', 'NCC129' : 'New Hanover', 
'NCC131' : 'Northampton', 'NCC133' : 'Onslow', 'NCC135' : 'Orange', 'NCC137' : 'Pamlico', 'NCC139' : 'Pasquotank', 'NCC141' : 'Pender', 'NCC143' : 'Perquimans', 
'NCC145' : 'Person', 'NCC147' : 'Pitt', 'NCC149' : 'Polk', 'NCC151' : 'Randolph', 'NCC153' : 'Richmond', 'NCC155' : 'Robeson', 'NCC157' : 'Rockingham', 
'NCC159' : 'Rowan', 'NCC161' : 'Rutherford', 'NCC163' : 'Sampson', 'NCC165' : 'Scotland', 'NCC167' : 'Stanly', 'NCC169' : 'Stokes', 'NCC171' : 'Surry', 'NCC173' : 'Swain', 
'NCC175' : 'Transylvania', 'NCC177' : 'Tyrrell', 'NCC179' : 'Union', 'NCC181' : 'Vance', 'NCC183' : 'Wake', 'NCC185' : 'Warren', 'NCC187' : 'Washington', 'NCC189' : 'Watauga', 
'NCC191' : 'Wayne', 'NCC193' : 'Wilkes', 'NCC195' : 'Wilson', 'NCC197' : 'Yadkin', 'NCC199' : 'Yancey'}

In [3]:
colors = ["white", "red"]


<h2>Getting Current Weather Updates</h2>

In [4]:
status = []
for i in list_of_codes:
    url = f"https://alerts.weather.gov/cap/wwaatmget.php?x={i}&y=0"
    page = requests.get(url).text
    soup = BeautifulSoup(page, "html.parser")
    titles = soup.find_all("title")
    county = dictionary_of_counties[i]
    string = str(titles[1])
    if string.__contains__("There are no active watches"):
        status.append([county, string, 0])
    else:
        temp = soup.find_all("summary")
        temp = str(temp[0])
        temp = temp[12:]
        temp = temp[:-10]
        status.append([county, temp, 1])
status_df = pd.DataFrame(status, columns = ["County", "Alert", "Codes"])



<h2>Builiding the Map</h2>

In [5]:
#setting the center of the map
raleigh = [35.7796, -78.6382]

#List of County Coordinates
gps_list = pd.read_csv("NCGPS.csv")

In [6]:
# Initial the map
colors = ["white", "red"]
site_map = folium.Map(location = raleigh, zoom_start = 7.5)
# For each county, add a Circle object based on its coordinate (Lat, Long) values and it's county name
gps_redone = gps_list.reset_index()
for index, row in gps_redone.iterrows():
    coords = [float(row[2]), float(row[3])]
    label = str(row[1])
    alert1 = status_df[status_df["County"] == label]
    alert2 = alert1["Alert"].item()
    alert3 = alert1["Codes"].item()
    circle = folium.map.Marker(coords, tooltip = label, popup = alert2, icon = folium.Icon(color = colors[alert3]))
    site_map.add_child(circle)
site_map
