# **Guided Lab 343.3.7 - Count the occurrences of unique values from column**

## **Learning Objective:**
In this lab, you will demonstrate how to count the occurrences of unique values from column on Pandas DataFrame, we will use value_count() function for that.

By the end of this lab, learners will be able to:
- Use value_count() function to count the occurrences of unique values from column on Pandas DataFrame.
- Analyze the quick summary of the unique values and their frequencies, making it a valuable tool in data exploration and analysis.







## **Introduction**:
The **`value_counts()`** function in pandas is used to count the occurrences of unique values in a Series (a single column of a DataFrame). It returns a pandas Series with the unique values as the index and the counts of each unique value as the corresponding values in the Series.

Here are some common use cases for value_counts():

# **Example 1: Frequency Analysis:**

To understand the distribution of values in a categorical variable.
Example: Count the number of occurrences of each category in a column representing product categories, customer segments, etc.

In [3]:
# import pandas library to the codespace
import pandas as pd

In [2]:
# Sample DataFrame
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Books', 'Books', 'Clothing']}
df = pd.DataFrame(data)

# Count the occurrences of each category
category_counts = df['Category'].value_counts()
print(category_counts)

Category
Electronics    2
Clothing       2
Books          2
Name: count, dtype: int64


## **Example 2: Frequency Analysis:**
The example below shows the frequency of items in the column ‘Brand’ and prints the item that occurs more than once in the same column.

In [8]:
# Create the data source
sales_data = {"Devices":['Laptop','iPhone','LED','LCD','Smart-Phone','Washing-Machine'],
           'Brand':['Lenovo','Apple','Samsung','Samsung','Samsung','Whirpool'],
           'Sales':[1000,2000,4000,2000,1000,4000],
           'Profit':[500,1000,1000,1500,1000,1500],
           'Pices left':[5000,4000,4000,5000,5000,1000]}
# create the pandas dataframe
df = pd.DataFrame(sales_data)
print(df)
# Frequency of items in the column brand
Frequency = (df.Brand.value_counts())
print("Print Frequency:\n", Frequency)
# print the product that occurs more than once
Frequent_product = Frequency[Frequency > 1].index[0]
print("Frequent Products:\n", Frequent_product)
# display the items along with their frequency
display(Frequency)
# print the item that occurs more than once
print(" This item appears more than once:",Frequent_product)

           Devices     Brand  Sales  Profit  Pices left
0           Laptop    Lenovo   1000     500        5000
1           iPhone     Apple   2000    1000        4000
2              LED   Samsung   4000    1000        4000
3              LCD   Samsung   2000    1500        5000
4      Smart-Phone   Samsung   1000    1000        5000
5  Washing-Machine  Whirpool   4000    1500        1000
Print Frequency:
 Brand
Samsung     3
Lenovo      1
Apple       1
Whirpool    1
Name: count, dtype: int64
Frequent Products:
 Samsung


Brand
Samsung     3
Lenovo      1
Apple       1
Whirpool    1
Name: count, dtype: int64

 This item appears more than once: Samsung


## **Example 3: Checking Missing Values:**

To quickly identify missing values in a column.
Example: Count the occurrences of each value in a column and check for any values that stand out, such as 0 or -1.

In [None]:
# Sample DataFrame with missing values
data = {'Score': [85, 92, 88, 75, None, 90, None, 85]}
df = pd.DataFrame(data)

# Count the occurrences of each value in the 'Score' column
# dropna=True Don’t include counts of rows containing NA values
score_counts = df['Score'].value_counts(dropna=False) # dropna=False include counts of rows containing NA values
print(score_counts)


Score
85.0    2
NaN     2
92.0    1
88.0    1
75.0    1
90.0    1
Name: count, dtype: int64


## **Example 4: Checking Data Quality:**

To quickly identify potential issues with data quality.
Example: Identify unexpected or outlier values in a numerical column

In [11]:
# Sample DataFrame
data = {'Age': [25, 30, 25, 35, 25, 40, 25, 30, 45, 25, 30, 25]}
df = pd.DataFrame(data)

# Count the occurrences of each age
age_counts = df['Age'].value_counts()
age_counts2 = df['Age'].value_counts(normalize=True) # normaliz=True returns proportion(percentages)
print(age_counts)
print('Percentage:')
print(age_counts2)


Age
25    6
30    3
35    1
40    1
45    1
Name: count, dtype: int64
Percentage:
Age
25    0.500000
30    0.250000
35    0.083333
40    0.083333
45    0.083333
Name: proportion, dtype: float64


## **Submission Instructions**
- Submit your completed lab using the Start Assignment button on the assignment page in Canvas.
- Your submission can be include:
  - if you are using notebook then, all tasks should be written and submitted in a single notebook file, for example: (**your_name_labname.ipynb**).
  - if you are using python script file, all tasks should be written and submitted in a single python script file for example: **(your_name_labname.py)**.
- Add appropriate comments and any additional instructions if required.

