Processing data in Python, SQL, building a report

Python project

1. Python project - description

1.	Write a script 'append_tables.py' that
	1.1 reads data from all the CSV files
	1.2 appends all data into single table
	1.3 first column will be 'Date' (taken from the file name)
	1.4 prints number of files, number of files with data in it, total number of rows
	1.5 stores the final table into csv as 'user_actions_{today}.csv' e.g. 'user_actions_2023-10-17.csv'
	
2. 	Write a function 'clean_cols' in 'helpers.py' that cleanses the column names e.g. 'Unique Visitor ID [v78] (evar78)' -> 'Unique Visitor ID'
	2.1 this function should be imported and used in the main script 'append_tables.py'

1. Python project - solution

The output consists of 2 Python files (append_table.py and helpers.py) that are able to read all CSV files, append all data into one table, modify the first column as required, print all results and store the final table into the required format.

IMPORTANT: The append_table.py is defined as a Python class. To run the script you just need to specify the location of the data. You can run in your terminal via the following command:

python append_tables.py "data_folder_path"

1. Python project - output files

append_tables.py: Python class that reads, appends, modifies as required and stores all data, plus print necessary results
helpers.py: Python function that cleanses the column names (imported in the append_tables.py)
user_actions_2024-02-14.csv: Final csv file
Command_line_example.png: Screenshot with results and showing what the command to run the script could look like

SQL project

2. SQL project - description

INSTRUCTIONS TO COMPLETE THE SQL TESTING:

1.	Download any free database tool (example: https://dbeaver.io/download/)
2. 	Your dataset is "data.sqlite"
3. 	Compose 2 SQL queries
4.	Query 1 - select the top 10 users from the available tables. 
              Return fields - Unique Visitor ID, First Name, Last Name, COUNT (user occurence in USER_ACTION table), sort results by COUNT
5.  Query 2 - from the available tables select PERIOD, GenClick Cleansed as CATEGORY, FIRST NAME, LAST NAME, COUNT (rows)

		Query conditions:
	
		5.1.	LAST NAME is either Edwards, or Harris
		5.2.	PERIOD contains string 'JUN'

		QUESTION:	How many rows does the query return?
6.	After you are finished, please send us both SQL queries.

2. SQL project - solution

The output consists of 2 SQL queries that should cover the requirements mentioned above.

QUERY 2: The query returns 17 rows.

2. SQL project - output files

1st_query.sql: Selecting unique visitor ID, first name, last name, and counting occurrences
2nd_query.sql: Selecting period, category, first name, last name, and counting rows

Excel project

3. Excel project - description

Report: This tab contains a table information. The table must contain all avaible columns from tab Data 1 and columns that will be merged with the information from Data 2.
Pivot: It displays 3 pivot tables
User running this report needs to be able to refresh previously created queries/tables in a single click as well as share the report with others. For a specific purpose another user receiving this report expects to have list of unique categories in the 'dropdown' sheet.

3. Excel project - solution

The solution has been prepared in 2 versions - In Excel (I do not have paid version of Excel, so I worked on this task in Google Sheets which could be an alternative to Excel) and via Web application in Python.

Start the Application

Directly (using streamlit library)

First, make sure you have Streamlit and Pandas installed (e.g. pip install streamlit). To run the app, navigate to the directory containing the script in your terminal and run:

streamlit run app.py

Using a Dockerfile

You can also use a Dockerfile to build the image for the scraper script. Open a terminal and navigate to the directory containing your Dockerfile and Streamlit app files. Build the image using the following command:

docker build -t my_streamlit_app .

You can replace "my_streamlit_app" with the desired name for your Docker image.

Run the Docker container: Once the image is built, you can run it as a container:

docker run -p 8501:8501 my_streamlit_app

This command maps port 8501 of your local machine to port 8501 inside the container, where Streamlit is running.

Access your Streamlit app: Open a web browser and go to http://localhost:8501 to access your Streamlit app running inside the Docker container.

3. Excel project - possible improvements (if there is more time)

Python files could be refactored and optimized
The layout could be adapted to the user requeirements
Results could be direcly send to the email recipient
One pivot table is missing (due to lack of time)
Storing to Excel could be fixed as well

3. Excel project - files

app_functions.py: Functions to calculate information for the report
app.py: Web application written in Python
Dockerfile: This file contains instructions for building your Docker image.
requirements.txt: This file contains all the Python libraries your Streamlit app depends on.
Report_Feb14_2024.xlsx: This file contains merged data from Data 1 and Data 2 (in the column "Report data"), then 3 pivot tables and a sheet called "dropdown" using function UNIQUE to get 'Data 2' column 'O' (Category) in column 'A' in this sheet called 'dropdown'

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Attachments		Attachments
Excel_project		Excel_project
Python_project		Python_project
SQL_project		SQL_project
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Processing data in Python, SQL, building a report

Python project

1. Python project - description

1. Python project - solution

1. Python project - output files

SQL project

2. SQL project - description

2. SQL project - solution

2. SQL project - output files

Excel project

3. Excel project - description

3. Excel project - solution

Start the Application

Directly (using streamlit library)

Using a Dockerfile

3. Excel project - possible improvements (if there is more time)

3. Excel project - files

About

Uh oh!

Releases

Packages

Uh oh!

Languages

maon007/Data_Integration_and_Analysis_Python-SQL-Excel

Folders and files

Latest commit

History

Repository files navigation

Processing data in Python, SQL, building a report

Python project

1. Python project - description

1. Python project - solution

1. Python project - output files

SQL project

2. SQL project - description

2. SQL project - solution

2. SQL project - output files

Excel project

3. Excel project - description

3. Excel project - solution

Start the Application

Directly (using streamlit library)

Using a Dockerfile

3. Excel project - possible improvements (if there is more time)

3. Excel project - files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages