# General Analysis Ideas

### Which fruits/vegetables have the highest/lowest sales volume and revenue?
    Identify the top-selling and least-selling products in terms of volume and revenue to understand the most and least popular items.

### What are the sales trends for specific fruits/vegetables over time?
    Analyze weekly sales data to identify seasonal trends, demand patterns, and potential growth opportunities.

### What is the average price and volume sold per unit for each fruit/vegetable?
    Calculate the average price and volume to understand the typical market conditions for different products.

### Which markets show the highest/lowest sales for specific fruits/vegetables?
    Determine the geographical regions with the highest and lowest sales to optimize distribution and marketing efforts.

### What is the overall revenue and volume for all fruits/vegetables combined?
    Provide an overview of total revenue and volume to showcase the overall performance of the business.

### Are there any price or volume fluctuations over time?
    Analyze price and volume variations to identify factors affecting sales and potential pricing strategies.

### Which fruits/vegetables have the highest profit margins?
    Calculate profit margins for each product to identify opportunities for maximizing profitability.

### What are the best-selling fruits/vegetables in each market?
    Determine the top products in each market to guide inventory management and marketing efforts.

### Can we identify any correlations between price and volume for specific products?
    Conduct a correlation analysis to explore the relationship between price and volume for individual items.

### What is the overall market share of each fruit/vegetable?
    Calculate the market share of each product to understand its position compared to competitors.

### How do pricing strategies impact sales volume and revenue?
    Analyze the effect of pricing changes on sales performance to optimize pricing strategies.

### Which fruits/vegetables have the highest customer satisfaction or repeat purchase rates?
    Use customer feedback data to identify products with high satisfaction levels and loyal customers.

### Can we forecast future sales for specific products based on historical data?
    Utilize time series analysis and forecasting techniques to predict future sales for individual items.

### What are the most profitable markets for each fruit/vegetable?
    Analyze the profit margins across different markets to identify lucrative opportunities.

### Are there any product combinations that lead to increased sales?
    Analyze cross-selling patterns to identify potential product bundling opportunities.

In [9]:
import numpy as np
import pandas as pd
import calendar
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go

In [4]:
df = pd.read_csv('./exported_data_2018-09-24_to_2023-07-14_V1.2.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Variedad,Mercado,Unidad,Dia,Precio,Fecha,Volumen,Producto,Total,Latitude,Longitude,Numero Unidad,Nombre Unidad
0,0,Morada,Vega Modelo de Temuco,MALLA 18 KILOS,Lunes,0.0,2018-09-24,0.0,Cebolla,0.0,-38.693615,-72.524888,18,KILOS
1,1,Sin especificar,Femacal de La Calera,MALLA 18 KILOS,Lunes,6777.7759,2018-09-24,180.0,Cebolla,1220000.0,-32.785623,-71.188495,18,KILOS
2,2,Sin especificar,Vega Central Mapocho de Santiago,MALLA 18 KILOS,Lunes,0.0,2018-09-24,0.0,Cebolla,0.0,-33.427612,-70.649499,18,KILOS
3,3,Sin especificar,Vega Monumental Concepción,MALLA 18 KILOS,Lunes,0.0,2018-09-24,0.0,Cebolla,0.0,-36.807464,-73.071571,18,KILOS
4,4,Sin especificar,Central Lo Valledor de Santiago,MALLA 16 KILOS,Lunes,8830.4307,2018-09-24,2300.0,Cebolla,20309990.0,-33.481183,-70.68251,16,KILOS


In [7]:
df[df['Precio'] <= 0].shape[0], df[df['Precio'] > 0].shape[0]

(68199, 112639)

In [11]:
df[df['Volumen'] <= 0].shape[0], df[df['Volumen'] > 0].shape[0]

(68289, 112772)

In [15]:
df_drop = df.drop(index=df[df['Volumen'] <= 0].index)
df_drop[df_drop['Volumen'] <= 0].shape[0], df_drop[df_drop['Volumen'] > 0].shape[0]

(0, 112772)

In [16]:
df_drop = df_drop.drop(index=df_drop[df_drop['Precio'] <= 0].index)
df_drop[df_drop['Precio'] <= 0].shape[0], df_drop[df_drop['Precio'] > 0].shape[0]

(0, 112603)

In [49]:
a = df_drop.copy()
a['Precio Unidad'] = a['Precio'] / a['Numero Unidad']
a['Fecha'] = pd.to_datetime(a['Fecha'])
a["Ano"] = a.Fecha.dt.year
a["Mes"] = a.Fecha.dt.month

a = a[a['Producto'] == 'Palta']
a = a[a['Variedad'] == 'Hass']
a = a[a['Nombre Unidad'] == 'KILOS']

df_drop['Unidad'].unique()

array(['MALLA 18 KILOS', 'MALLA 16 KILOS', 'CAJA 10 UNIDADES',
       'CAJA 15 UNIDADES', 'BIN (400 KILOS)', 'CAJA 16 KILOS EMPEDRADA',
       'CAJA 15 KILOS GRANEL', 'KILO (EN CAJA DE 15 KILOS)',
       'KILO (EN CAJA DE 17 KILOS)', 'SACO 25 KILOS', 'MALLA 25 KILOS',
       'BIN (450 KILOS)', 'CAJA 18 KILOS EMPEDRADA', 'BANDEJA 18 KILOS',
       'CAJA 12 KILOS', 'SACO 20 KILOS', '1 UNIDADES', 'MALLA 15 KILOS',
       'PAQUETE 20 UNIDADES', 'PAQUETE 20 UNIDADES (VOLUMEN EN UNIDADES)',
       'BANDEJA 10 KILOS', '$/PAQUETE 20 UNIDADES (VOLUMEN EN UNIDADES)',
       'CAJA 16 KILOS', 'CAJA 20 KILOS', 'CAJA 15 KILOS',
       'CAJA 15 KILOS EMPEDRADA', 'MALLA 17 KILOS',
       'KILO (EN CAJA DE 18 KILOS)', 'BANDEJA 8 KILOS', 'CAJA 14 KILOS',
       'CAJA 18 KILOS', 'CAJA 10 KILOS', 'BANDEJA 9 KILOS',
       'BANDEJA 12 KILOS', 'MALLA 20 KILOS',
       'PAQUETE 10 UNIDADES (VOLUMEN EN UNIDADES)', 'MALLA 13 KILOS',
       'BANDEJA 15 KILOS GRANEL', 'BANDEJA 18 KILOS GRANEL',
       'CAJA 13 K