Ingestion and data analysis: Python

This project can be used to modify almost all type of structured file and send the data to a endpoint through a post request (tipically a Rest API request). The project is very flexible and you can handle every your need by the .cfg file. You can also produce a simple statistc file about the data in your file before any data mining action.

First steps

Setup

These are the instructions to follow to set up the project on your local environment. The steps 3 and 4 are optional.

I used Python 3.6 to launch the commands in the project so if you prefer (or you have already installed) a version Python 2.* there are some not optional edits to do.

In this case just have look to the comments in the code to fix the enviroment and be ready to start

Prerequisites

Install every library used in the files:

pandas
requests
json
time
datetime
xlrd

You can use the pip command: - Windwos: python -m pip install library-name; - Linux: pip install library-name;

Steps

Git clone the repository into your folder.

 git clone https://github.com/your_username/data-ingestion.git

Copy project.cfg.example to project.cfg

 Use GENERAL section to setup the file info to use in step 3 and 4.

Launch the summury.py file to get basic statistics about the dataset or some particolar columns. Set in project.cfg file the columns you want to analyze using STATISTICSCOLS section. It will generate a .csv file in the folder statistics with a name established in SUMMARY section.
```
 python summury.py
```
Launch the mining.py file to go do data mining and correct the dataset changing column names, modifying values or dropping columns. Set in project.cfg file (in GENERAL section) the flag to decide if it's necessary any edits in the dataset, then set the columns you want to modify (MODIFIERSCOLS section), the values to change (MODIFIERSVALUES section), the columns to merge (MERGECOLS section) and the column(s) to drop. It will generate a .csv file in the folder files with the name data.csv; it will produce also some logs file to trace every step.
```
 python mining.py
```
Launch the ingestion.py file to finally send the data into your file to a endpoint (maybe a your application in which you want to increment the data). Set in project.cfg file every parameter (in INGESTION section) that is necessary to send the data. This step will generate a errors.csv file in the folder history_errors with a name that is incremental and composed by date_hour_minute to keep every file of error and reuse this file. It will produce also some logs file to trace every step.
```
 python ingestion.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
files		files
history_errors		history_errors
statistics		statistics
utility_classes		utility_classes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
ingestion.py		ingestion.py
mining.py		mining.py
project.cfg.example		project.cfg.example
requirements.txt		requirements.txt
summary.py		summary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

files

files

history_errors

history_errors

statistics

statistics

utility_classes

utility_classes

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

ingestion.py

ingestion.py

mining.py

mining.py

project.cfg.example

project.cfg.example

requirements.txt

requirements.txt

summary.py

summary.py

Repository files navigation

Ingestion and data analysis: Python

First steps

Setup

Steps

About

Releases

Packages

Languages

License

gsaraceno92/data-ingestion

Folders and files

Latest commit

History

Repository files navigation

Ingestion and data analysis: Python

First steps

Setup

Steps

About

Topics

Resources

License

Stars

Watchers

Forks

Languages