# Data Wrangler & Analyzer
**Author**: Enrique Sanchez

The data wrangling and analysis process can be long and complex. This is especially true if one is not familiar with the tools necessary to accomplish such tasks. Therefore, I decided to develop this notebook to simplify and make the process easier. No prior programming experience is required.

Through this notebook you have the ability to drop columns/rows, alter the headers, geocode, explore column statistics, filter the data set, and visualize.

This notebook was developed using primarily [Pandas](https://pandas.pydata.org/) and [Holoviz](https://holoviz.org/). If you have any feedback, questions, or simply want to get in touch, then feel free to email me at ens004@ucsd.edu. Enjoy!

**If you are not familiar with Jupyter Notebook:**
- Run each cell using the `Run` button on the toolbar or by pressing `Ctrl+Enter`
--------

Before we begin, run this cell to import necessary libraries and scripts.

In [None]:
import pandas as pd
import panel as pn
import FileScript as fs
import HoloV as ho
import GeoTools as gt

## 1. Upload Data

Please run the first cell and upload a data set of your choice. Currently, this notebook supports `.csv`, `.tsv`, and `.txt` files.

Don't have a data set but want to take full advantage of this notebook? Download this sample data set <a href="power.csv" download>here</a>. This is a military strength ranking data set derived from [Kaggle](https://www.kaggle.com/blitzr/gfp2017).

In [None]:
data = pn.widgets.FileInput()
data

Now let's run this next cell to view our data!

In [None]:
data.save(data.filename)
ho.view_data(data.filename)

## 2. Modify Data

Here you can drop rows/column of your choice and modify the header if needed. 

Once you are satisfied with the data set, please click `Finish & Save Data`.

In [None]:
fs.modify_data(ho.original_df, data.filename)

## 3. Select Data Set

Select the data set you want to work with. This can be either the data set `Saved` from above or the `Original` data set uploaded. 

**Note:** You can always come back here and change your selection.

In [None]:
ho.select_data()

## 4. Geocoder

Do you have a location/address column in your data set? 

- If no, you may continue on to the next section.
- If yes, geocode this column to generate new latitude and longitude columns for your data set! Later when visualizing, the option to display coordinates on a map will be available.

**Note:** If a latitude and longitude column is detected in your data set, this feature will be unavailable.

In [None]:
gt.geocoder(ho.df)

## 5. Explore

Now it's time to explore the data. You may view individual column statistics on the entire data set or a filtered portion of the data set.

Simply modify the expression and variable widgets accordingly.

In [None]:
ho.explore_data()

## 6. Visualize

Here we can visualize the data!

Available plots include:
- **Univariate:** histogram, boxplot, density plots
- **Multivariate:** scatter plot
- **Spatial:** map (if latitude and longitude columns exist)

In [None]:
ho.visualize()