**Requirement Specification Document**
1. Introduction
This document outlines the requirements for a python-based system to process sensor data and generate reports. The system will read data from CSV files, process the data to calculate monthly averages, maximum and minimum values, and identify outliers based on predefined thresholds.

2. System Overview
The system will consist of the following components:

	Data Ingestion Module: Reads sensor data and threshold data from CSV files.
	Processing Module: Calculates monthly averages, maximum, and minimum values for each sensor.
	Outlier Detection Module: Identifies outliers based on threshold values.
	Reporting Module: Generates two CSV reports: one for monthly statistics and one for outliers.
	Error Handling Module: Manages errors and provides appropriate error codes and messages.

3. Functional Requirements
3.1 Data Ingestion
	Input Files:
	sensor_data.csv: Contains sensor readings.
	Fields: date, sensor_type, value, unit, location_id
	thresholds.csv: Contains threshold values for sensors.
	Fields: sensor_type, min_threshold, max_threshold
	
    Functionality:
	Read and parse the CSV files.
	Validate the data format and content.

3.2 Processing Module
	Monthly Statistics Calculation:
	Calculate the monthly average, maximum, and minimum values for each sensor.
	Generate a CSV file (`monthly_stats.csv`) with the following fields:
	sensor_type, month, avg_value, max_value, min_value

3.3 Outlier Detection
	Outlier Identification:
	Compare sensor values against the thresholds.
	Identify values that fall outside the min and max thresholds.
	Generate a CSV file (`outliers.csv`) with the following fields:
	date, sensor_type, value, unit, location_id, threshold_exceeded [Min/Max]

3.4 Reporting Module
	Output Files:
	monthly_stats.csv: Contains monthly average, maximum, and minimum values.
	outliers.csv: Contains sensor readings that are outliers.

4. Non-Functional Requirements
	Performance: The system should process data efficiently and handle large CSV files.
	Scalability: The system should be able to scale to accommodate additional sensors and data volume.
	Reliability: The system should handle errors gracefully and provide meaningful error messages.
	Resiliency: System should log errors and continue processing even if 
	No thresholds defined for given sensor type 
	Incorrect data format for one of the row in the input csv

5. Error Handling
	Error Codes and Messages:
	ERR001: "File not found" - The specified CSV file could not be located.
	ERR002: "Invalid data format" - The data in the CSV file does not match the expected format.
	ERR003: "Processing error" - An error occurred during data processing.
	ERR004: "Thresholds not defined" - Threshold values for a sensor type are missing.

6. Assumptions and Constraints
	The CSV files are updated regularly and contain accurate data.
	The system will run on a server with sufficient resources to handle the data processing tasks.

7. Glossary
	CSV: Comma-Separated Values, a file format used to store tabular data.
	Sensor: A device that detects and measures physical properties.
	Outlier: A data point that differs significantly from other observations.


In [33]:


#input files
file_path1 = r'C:\Users\MSVPraveenPallapothu\Downloads\sensor_data.csv'
file_path2 = r'C:\Users\MSVPraveenPallapothu\Downloads\thresholds.csv'

#output files
file_path3 = r'C:\Users\MSVPraveenPallapothu\Downloads\monthly_stats.csv'
file_path4 = r'C:\Users\MSVPraveenPallapothu\Downloads\outliers.csv'


print(file_path1)
print(file_path2)
print(file_path3)
print(file_path4)


C:\Users\MSVPraveenPallapothu\Downloads\sensor_data.csv
C:\Users\MSVPraveenPallapothu\Downloads\thresholds.csv
C:\Users\MSVPraveenPallapothu\Downloads\monthly_stats.csv
C:\Users\MSVPraveenPallapothu\Downloads\outliers.csv


In [34]:
def read_csv(file_path):
    data = []
    try:
        with open(file_path, 'r') as file:
            header = file.readline().strip().split(',')
            
            for line in file:
                row = line.strip().split(',')
                data.append(dict(zip(header, row)))
        return data
    
    except FileNotFoundError:
        print(f"ERR001: File not found - The file {file_path} could not be located.")
        return []
    
    except Exception as e:
        print(f"ERR002: Invalid data format - Error reading file {file_path}: {str(e)}")
        return []


In [35]:
def calculate_monthly_statistics(sensor_data):
    monthly_stats = {}
    
    try:
        for row in sensor_data:
            sensor_type = row['sensor_type']
            value = float(row['value'])
            month = row['month']

            if sensor_type not in monthly_stats:
                monthly_stats[sensor_type] = {}
            if month not in monthly_stats[sensor_type]:
                monthly_stats[sensor_type][month] = {'values': [], 'avg': 0, 'max': float('-inf'), 'min': float('inf')}

            monthly_stats[sensor_type][month]['values'].append(value)

            monthly_stats[sensor_type][month]['max'] = max(monthly_stats[sensor_type][month]['max'], value)
            monthly_stats[sensor_type][month]['min'] = min(monthly_stats[sensor_type][month]['min'], value)

        for sensor_type in monthly_stats:
            for month in monthly_stats[sensor_type]:
                values = monthly_stats[sensor_type][month]['values']
                monthly_stats[sensor_type][month]['avg'] = sum(values) / len(values)
    except Exception as e:
        print(f"ERR003: Processing error - Error calculating monthly statistics: {str(e)}")
    return monthly_stats


In [36]:
def detect_outliers(sensor_data, thresholds):
    outliers = []
    
    try:
        for row in sensor_data:
            sensor_type = row['sensor_type']
            value = float(row['value'])

            if sensor_type not in thresholds:
                outliers.append({
                    'date': row['date'],
                    'sensor_type': sensor_type,
                    'value': value,
                    'unit': row['unit'],
                    'location_id': row['location_id'],
                    'threshold_exceeded': 'No threshold defined'
                })
                continue

            min_threshold, max_threshold = thresholds[sensor_type]

            if value < min_threshold:
                outliers.append({
                    'date': row['date'],
                    'sensor_type': sensor_type,
                    'value': value,
                    'unit': row['unit'],
                    'location_id': row['location_id'],
                    'threshold_exceeded': 'Min'
                })
            elif value > max_threshold:
                outliers.append({
                    'date': row['date'],
                    'sensor_type': sensor_type,
                    'value': value,
                    'unit': row['unit'],
                    'location_id': row['location_id'],
                    'threshold_exceeded': 'Max'
                })
    except Exception as e:
        print(f"ERR003: Processing error - Error detecting outliers: {str(e)}")
    return outliers


In [37]:
def write_csv(file_path, data, header):
    try:
        with open(file_path, 'w') as file:
            file.write(','.join(header) + '\n')  # Write header
            for row in data:
                file.write(','.join(map(str, row.values())) + '\n') 
    except Exception as e:
        print(f"ERR002: Invalid data format - Error writing to file {file_path}: {str(e)}")


In [39]:
def main(sensor_data_file, thresholds_file, monthly_stats_file, outliers_file):
    sensor_data = read_csv(sensor_data_file)
    if not sensor_data:
        return

    thresholds = {}
    try:
        thresholds_data = read_csv(thresholds_file)
        if not thresholds_data:
            print("ERR004: Thresholds not defined - Thresholds file is empty or not found.")
            return
        
        for row in thresholds_data:
            thresholds[row['sensor_type']] = (float(row['min_threshold']), float(row['max_threshold']))
    except Exception as e:
        print(f"ERR002: Invalid data format - Error reading thresholds file: {str(e)}")
        return
    

    monthly_stats = calculate_monthly_statistics(sensor_data)
    if not monthly_stats:
        return 
    
    outliers = detect_outliers(sensor_data, thresholds)
    if not outliers:
        print("No outliers detected.")

    monthly_stats_data = []
    for sensor_type, months in monthly_stats.items():
        for month, stats in months.items():
            monthly_stats_data.append({
                'sensor_type': sensor_type,
                'month': month,
                'avg_value': stats['avg'],
                'max_value': stats['max'],
                'min_value': stats['min']
            })
    
    outliers_data = []
    for row in outliers:
        outliers_data.append({
            'date': row['date'],
            'sensor_type': row['sensor_type'],
            'value': row['value'],
            'unit': row['unit'],
            'location_id': row['location_id'],
            'threshold_exceeded': row['threshold_exceeded']
        })
    
    write_csv(monthly_stats_file, monthly_stats_data, ['sensor_type', 'month', 'avg_value', 'max_value', 'min_value'])
    
    write_csv(outliers_file, outliers_data, ['date', 'sensor_type', 'value', 'unit', 'location_id', 'threshold_exceeded'])
    
    print(f"Reports generated: {monthly_stats_file} and {outliers_file}")

main(file_path1,file_path2,file_path3,file_path4)

ERR003: Processing error - Error calculating monthly statistics: could not convert string to float: ''
ERR003: Processing error - Error detecting outliers: could not convert string to float: ''
Reports generated: C:\Users\MSVPraveenPallapothu\Downloads\monthly_stats.csv and C:\Users\MSVPraveenPallapothu\Downloads\outliers.csv
