# Clean & Analyze Social Media

## Introduction

Social media has become a ubiquitous part of modern life, with platforms such as Instagram, Twitter, and Facebook serving as essential communication channels. Social media data sets are vast and complex, making analysis a challenging task for businesses and researchers alike. In this project, we explore a simulated social media, for example Tweets, data set to understand trends in likes across different categories.

## Prerequisites

 A basic understanding of Python programming and data analysis concepts.

## Project Scope

The objective of this project is to analyze tweets (or other social media data) and gain insights into user engagement. We will explore the data set using visualization techniques to understand the distribution of likes across different categories. Finally, we will analyze the data to draw conclusions about the most popular categories and the overall engagement on the platform.

## Step 1: Importing Required Libraries

As the name suggests, the first step is to import all the necessary libraries that will be used in the project. In this case, we need pandas, numpy, matplotlib, seaborn, and random libraries.

Pandas is a library used for data manipulation and analysis. Numpy is a library used for numerical computations. Matplotlib is a library used for data visualization. Seaborn is a library used for statistical data visualization. Random is a library used to generate random numbers.

In [None]:
# Social Media Data Analysis Project

In [37]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

In [38]:


# Define the list of categories
categories = ['Food', 'Travel', 'Fashion', 'Fitness', 'Music', 'Culture', 'Family', 'Health']

# Number of entries (n)
n = 500

# Generate random data
data = {
    'Date': pd.date_range('2023-09-01', periods=n),
    'Category': [random.choice(categories) for _ in range(n)],
    'Likes': np.random.randint(0, 10000, size=n)
}

# Create a DataFrame
Social_media= pd.DataFrame(data)

# Print the first few rows of the DataFrame
print(Social_media.head(10))

        Date Category  Likes
0 2023-09-01    Music   7078
1 2023-09-02    Music   3943
2 2023-09-03   Health   3590
3 2023-09-04     Food     38
4 2023-09-05   Family   2168
5 2023-09-06  Culture   5716
6 2023-09-07     Food     40
7 2023-09-08   Family   2713
8 2023-09-09   Travel   1696
9 2023-09-10     Food   4405


In [39]:
# Print the first few rows of the DataFrame
print("DataFrame Head:")
print(Social_media.head())

# Print DataFrame Information
print("\nDataFrame Information:")
print(Social_media.info())

# Print DataFrame Description
print("\nDataFrame Description:")
print(Social_media.describe())

# Print the count of each 'Category' element
category_counts = Social_media['Category'].value_counts()
print("\nCount of each 'Category' element:")
print(category_counts)

DataFrame Head:
        Date Category  Likes
0 2023-09-01    Music   7078
1 2023-09-02    Music   3943
2 2023-09-03   Health   3590
3 2023-09-04     Food     38
4 2023-09-05   Family   2168

DataFrame Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Date      500 non-null    datetime64[ns]
 1   Category  500 non-null    object        
 2   Likes     500 non-null    int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 11.8+ KB
None

DataFrame Description:
             Likes
count   500.000000
mean   5059.086000
std    2902.224193
min      12.000000
25%    2554.000000
50%    5078.500000
75%    7658.750000
max    9990.000000

Count of each 'Category' element:
Music      74
Family     71
Fashion    71
Food       66
Culture    62
Fitness    58
Health     51
Travel     47
Name: Category, dtype: int64
