# deleting duplicate 

# let's consider a practical use case where you might need to delete duplicates from a dataset. Suppose you're building a system to analyze customer orders from an e-commerce website. Sometimes, due to various reasons, duplicate orders may exist in the system. To ensure accurate analysis and reporting, it's essential to remove these duplicates from the dataset before further processing.

# Explanation:

The remove_duplicates function takes a list of orders as input.

It initializes an empty list unique_orders to store unique orders and a set seen_order_ids to keep track of seen order IDs.

It iterates through each order in the input list.

For each order, it extracts the order ID.

If the order ID is not present in the set seen_order_ids, it means this order hasn't been seen before. So, it adds the order to the unique_orders list and adds the order ID to the set seen_order_ids.

Finally, it returns the list of unique orders.

In the example usage, we create a list of orders containing some duplicates. We then call the remove_duplicates function to get the list of unique orders and print the result.


In [1]:
def remove_duplicates(orders):
    """
    Removes duplicate orders from a list of orders.

    Args:
    orders (list): A list of orders where each order is represented as a dictionary.

    Returns:
    list: A list of unique orders.
    """
    unique_orders = []
    seen_order_ids = set()

    for order in orders:
        order_id = order['order_id']
        if order_id not in seen_order_ids:
            unique_orders.append(order)
            seen_order_ids.add(order_id)

    return unique_orders

In [2]:
# Example usage:
orders = [
    {'order_id': 101, 'product': 'Laptop', 'quantity': 1},
    {'order_id': 102, 'product': 'Mouse', 'quantity': 2},
    {'order_id': 101, 'product': 'Laptop', 'quantity': 1},  # Duplicate order
    {'order_id': 103, 'product': 'Keyboard', 'quantity': 1},
    {'order_id': 102, 'product': 'Mouse', 'quantity': 2},  # Duplicate order
]

In [4]:
unique_orders = remove_duplicates(orders)
print(unique_orders)

[{'order_id': 101, 'product': 'Laptop', 'quantity': 1}, {'order_id': 102, 'product': 'Mouse', 'quantity': 2}, {'order_id': 103, 'product': 'Keyboard', 'quantity': 1}]


# drop_duplicates() is a pandas function used to remove duplicate rows from a DataFrame. Here's an example Python code using drop_duplicates() along with an explanation:

In [6]:
import pandas as pd

In [7]:
# Creating a sample DataFrame with duplicate rows
data = {
    'A': [1, 2, 3, 4, 2, 3, 5, 6, 7, 7],
    'B': ['a', 'b', 'c', 'd', 'b', 'c', 'e', 'f', 'g', 'g']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Original DataFrame:
   A  B
0  1  a
1  2  b
2  3  c
3  4  d
4  2  b
5  3  c
6  5  e
7  6  f
8  7  g
9  7  g


In [8]:
# Removing duplicate rows using drop_duplicates()
unique_df = df.drop_duplicates()

In [9]:
# Displaying the DataFrame after removing duplicates
print("\nDataFrame after removing duplicates:")
print(unique_df)


DataFrame after removing duplicates:
   A  B
0  1  a
1  2  b
2  3  c
3  4  d
6  5  e
7  6  f
8  7  g


# Explanation:

Importing pandas:

import pandas as pd imports the pandas library under the alias pd, making it accessible as pd.
Creating Sample DataFrame:

A sample DataFrame df is created from a dictionary data. This DataFrame contains two columns 'A' and 'B' with some duplicate rows.
Printing Original DataFrame:

The original DataFrame df is printed to display its content before removing duplicates.
Removing Duplicates using drop_duplicates():

drop_duplicates() is called on the DataFrame df. By default, it removes rows where all columns have the same values.
The resulting DataFrame with duplicates removed is stored in unique_df.
Printing DataFrame After Removing Duplicates:

The DataFrame unique_df containing the DataFrame with duplicates removed is printed to display the result.