A comprehensive collection of advanced Python projects showcasing signal processing, time series analysis, graph algorithms, and machine learning techniques for real-world data challenges.
This repository contains cutting-edge implementations of data representation and analysis techniques across multiple domains, demonstrating proficiency in Python programming, mathematical modeling, and data science methodologies.
Exploring the frequency domain of audio signals through advanced DSP techniques
- Audio File Processing π§ - Load and playback WAV files using Python libraries
- Fast Fourier Transform (FFT) π - Convert time-domain signals to frequency representations
- Signal Combination π - Merge frequency domain representations of multiple audio sources
- Inverse FFT (IFFT) β©οΈ - Reconstruct time-domain signals from frequency data
- Short-Time Fourier Transform (STFT) β±οΈ - Analyze time-frequency characteristics
- Spectrogram Visualization π - Create compelling visual representations of audio data
- Real-time audio processing and playback
- Advanced frequency domain analysis and manipulation
- Time-frequency decomposition for complex audio signals
- Interactive spectrogram interpretation
Advanced temporal data analysis and graph theory implementations
- Data Preprocessing π - Pandas-based datetime parsing and indexing
- Temporal Visualization π - Multi-variable time series plotting with correlation analysis
- Decomposition Analysis π§© - Trend and seasonal component extraction
- Signal Smoothing π - Simple Moving Average (SMA) and Exponential Moving Average (EMA)
- Stationarity Testing π - Augmented Dickey-Fuller (ADF) statistical tests
- Differencing Techniques π - First-order differencing for stationarity achievement
- Dijkstra's Algorithm πΊοΈ - Shortest path computation from Chicago O'Hare (ORD)
- Minimum Spanning Tree π³ - Both Prim's and Kruskal's algorithm implementations
- Weighted Graph Processing βοΈ - Real-world airport network analysis
- Libraries: Pandas, NumPy, Matplotlib, NetworkX, Statsmodels
- Algorithms: Advanced graph traversal and optimization techniques
- Statistical Methods: Time series stationarity testing and decomposition
Dual-domain ML applications: NLP sentiment analysis and computer vision
- Text Preprocessing Pipeline π§ - Tokenization, stopword removal, stemming/lemmatization
- Feature Engineering π - Advanced text vectorization techniques
- Data Visualization π - Word frequency plots, word clouds, distribution analysis
- Binary Classification π― - Logistic Regression vs Support Vector Machine comparison
- Performance Metrics π - Comprehensive accuracy reporting and confusion matrices
- Computer Vision Processing ποΈ - Olivetti Faces dataset manipulation
- Image Preprocessing π¨ - Vertical splitting and data preparation
- Support Vector Regression π§ - Left-to-right face completion model
- Visual Results π₯οΈ - Side-by-side reconstruction comparisons
- Bonus Challenge β - Random Forest Regression implementation and comparison
# Core Data Science Stack
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Signal Processing
from scipy.fft import fft, ifft
from scipy.signal import stft
import librosa
import soundfile as sf
# Machine Learning
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import accuracy_score, confusion_matrix
# Time Series Analysis
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
# Graph Algorithms
import networkx as nx
import heapq- Signal Processing π - FFT/IFFT, STFT, spectrogram analysis
- Time Series Analysis β° - Decomposition, stationarity, smoothing techniques
- Graph Theory πΈοΈ - Shortest path algorithms, MST implementations
- Machine Learning π€ - Classification, regression, model comparison
- Data Visualization π - Advanced plotting and interpretation
- Statistical Analysis π - Hypothesis testing, performance evaluation
-
Clone the repository:
git clone [repository-url] cd python-data-representation-projects -
Install dependencies:
pip install -r requirements.txt
-
Run individual assignments:
python assignment_6_audio_processing.py python assignment_7_timeseries_graphs.py python assignment_8_ml_classification.py
This repository demonstrates advanced proficiency in:
- Data Science Methodologies - End-to-end analysis pipelines
- Algorithm Implementation - From mathematical concepts to working code
- Real-World Applications - Audio processing, financial time series, NLP, computer vision
- Performance Optimization - Efficient data structures and processing techniques
- Real-time audio processing dashboard
- Interactive time series forecasting models
- Deep learning implementations for image reconstruction
- Graph neural network applications
- Deployment-ready ML model APIs