# Telecommunication Billing Data Pipeline Unit Testing

This project focuses on unit testing for a Telecommunication Billing Data Pipeline. The data pipeline is responsible for extracting data from a CSV file, performing transformations using pandas, and storing the transformed data in another CSV file. The unit tests will ensure that the data pipeline functions correctly and handles various scenarios and edge cases.

## Background Information

The Telecommunication Billing Data Pipeline project aims to process telecommunication billing data. The provided data pipeline consists of three main functions:

- `data_extraction`: Extracts data from a CSV file.
- `data_transformation`: Performs transformations on the extracted data.
- `data_loading`: Loads the transformed data into another CSV file.

## Guidelines for Unit Testing

To effectively test the Telecommunication Billing Data Pipeline, the following guidelines should be followed:

- Use the `unittest` framework to create test cases for each function in the data pipeline.
- Write at least three test cases for each function, covering different scenarios and edge cases.
- Ensure that your tests are independent and do not rely on each other.
- Name your test methods descriptively to indicate the scenario being tested.
- Use assertions to validate the expected behavior of each function.
- Provide informative error messages when assertions fail to aid in debugging.

## Sample Input Dataset

A sample CSV file named `billing_data.csv` has been provided. It contains telecommunication billing data with the following columns: 'customer_id', 'billing_amount', and 'tax_amount'. You can use this file as the input for your unit tests.

## Starting Code

The starting code includes the data pipeline functions (`data_extraction`, `data_transformation`, `data_loading`) and an empty `TestDataPipeline` test class. You need to write the unit tests for each function within this class.

Refer to the comments in the code for further instructions and guidance.

## Running the Unit Tests

To run the unit tests, execute the following code in the Colab notebook:



In [None]:
!python unit_testing.py

The test results will be displayed in the notebook.

The code implementingthis project assignment should be saved in a file named unit_testing.py in the same directory as the CSV files (billing_data.csv and output.csv).

This code includes the three test cases for each function (data_extraction, data_transformation, data_loading) as described in the project description. It compares the expected results with the actual results using assertions to validate the correctness of the functions.

In [None]:
import pandas as pd
import unittest

def data_extraction(file_path):
    data = pd.read_csv(file_path)
    return data

def data_transformation(data):
    data = data.drop_duplicates()
    data['billing_amount'] = data['billing_amount'].str.replace('$', '').astype(float)
    data['total_charges'] = data['billing_amount'] + data['tax_amount']
    return data

def data_loading(data, output_file):
    data.to_csv(output_file, index=False)

class TestDataPipeline(unittest.TestCase):
    def test_data_extraction(self):
        # Test if data is extracted correctly from the CSV file
        expected_result = pd.DataFrame({
            'customer_id': [1, 2, 3, 4, 5],
            'billing_amount': [100.0, 200.0, 300.0, 400.0, 500.0],
            'tax_amount': [10.0, 20.0, 30.0, 40.0, 50.0]
        })
        result = data_extraction('billing_data.csv')
        self.assertTrue(result.equals(expected_result), "Data extraction failed")

    def test_data_transformation(self):
        # Test if data transformation is performed correctly
        input_data = pd.DataFrame({
            'customer_id': [1, 2, 3],
            'billing_amount': [100.0, 200.0, 150.0],
            'tax_amount': [10.0, 20.0, 15.0]
        })
        expected_result = pd.DataFrame({
            'customer_id': [1, 2, 3],
            'billing_amount': [100.0, 200.0, 150.0],
            'tax_amount': [10.0, 20.0, 15.0],
            'total_charges': [110.0, 220.0, 165.0]
        })
        result = data_transformation(input_data)
        self.assertTrue(result.equals(expected_result), "Data transformation failed")

    def test_data_loading(self):
        # Test if data is loaded correctly into the CSV file
        input_data = pd.DataFrame({
            'customer_id': [1, 2, 3],
            'total_charges': [110.0, 220.0, 165.0]
        })
        expected_result = input_data
        data_loading(input_data, 'output.csv')
        result = pd.read_csv('output.csv')
        self.assertTrue(result.equals(expected_result), "Data loading failed")

if __name__ == '__main__':
    unittest.main()
