Income Prediction Project

Project Overview

The Income Prediction project is a comprehensive data science and machine learning endeavor, employing modular coding standards for effective project organization. The project encompasses data ingestion, data transformation, model training, and deployment stages, adhering to best practices in the field.

Problem Statement

The primary objective is to accurately predict whether an individual's income exceeds $50,000 based on demographic and employment-related features. This problem holds significance in real-world scenarios, aiding decision-making processes and resource allocation.

Dataset Overview

The dataset used for income prediction includes various features such as age, workclass, education, marital status, occupation, race, and more. The target variable is 'salary,' indicating whether an individual makes more or less than $50,000 annually.

Project Structure Automation

The project kicks off with the creation of a template.py file, automating the folder structure of the entire project. This not only ensures a standardized layout but also streamlines collaboration and maintenance.

Version Management

A setup.py file is introduced to facilitate versioning of the project. This allows for tracking changes, managing dependencies, and ensuring reproducibility across different environments.

Data Ingestion

Data Loading and Artifacts

In the data ingestion phase, the dataset is loaded, and an 'artifacts' folder is created to store essential artifacts generated throughout the project.

Data Splitting

The dataset is split into training and testing sets, a critical step in model evaluation to ensure robust performance.

Data Transformation

Label Encoding and Columns Transformation

Data transformation involves label encoding categorical variables and using a columns transformer for streamlined preprocessing. This prepares the data for the machine learning pipeline.

Model Training

Models Considered

The project explores three models – Random Forest Classifier, Decision Tree Classifier, and Logistic Regression. Grid Search CV is employed for hyperparameter tuning to enhance model performance.

Model Selection

After rigorous evaluation, the Random Forest Classifier emerges as the best-performing model, boasting an accuracy of 81%.

Deployment with Flask

The project is deployed using the Flask framework, providing a web interface for users to interact with the income prediction model. This deployment ensures practical applicability and accessibility.

Logger and Exception Handling

Custom exception handling and logging mechanisms are implemented to enhance code reliability and facilitate debugging. These additions contribute to the project's maintainability and robustness.

Data Visualization

Distribution of numerical features

Income VS Workclass

Income VS Education

Income VS Marital Status

Income VS Occupation

Income VS Relationship

Income VS Sex

Output:

Conclusion

The Income Prediction project showcases a systematic approach to data science and machine learning, incorporating modular coding practices for enhanced project structure and maintainability. The utilization of multiple models, thorough data transformation, and the deployment of the best-performing model via Flask demonstrate a comprehensive solution to the income prediction problem.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
artifacts		artifacts
configs		configs
income_pred		income_pred
jupyter file		jupyter file
notebook/data		notebook/data
templates		templates
.gitignore		.gitignore
README.md		README.md
error.txt		error.txt
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Income Prediction Project

Project Overview

Problem Statement

Dataset Overview

Project Structure Automation

Version Management

Data Ingestion

Data Loading and Artifacts

Data Splitting

Data Transformation

Label Encoding and Columns Transformation

Model Training

Models Considered

Model Selection

Deployment with Flask

Logger and Exception Handling

Data Visualization

Output:

Conclusion

About

Releases

Packages

Languages

samagra44/Income-Precision-Analytics

Folders and files

Latest commit

History

Repository files navigation

Income Prediction Project

Project Overview

Problem Statement

Dataset Overview

Project Structure Automation

Version Management

Data Ingestion

Data Loading and Artifacts

Data Splitting

Data Transformation

Label Encoding and Columns Transformation

Model Training

Models Considered

Model Selection

Deployment with Flask

Logger and Exception Handling

Data Visualization

Output:

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages