<a href="https://colab.research.google.com/github/monoxgit/test/blob/master/Ejercicio_iterrows.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Problem using iterrows

`iterrows()` is a method in pandas, a popular data analysis library in Python. It is used to iterate over the rows of a DataFrame. Here's how it works:

A DataFrame in pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). `iterrows()` is a way to loop through each row of a DataFrame one at a time. For each iteration, it returns an index and a pandas Series representing the row data.

Here's an example of how you might use `iterrows()`:

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

for index, row in df.iterrows():
    print(index, row['Name'], row['Age'], row['City'])
```

In this example, `iterrows()` allows you to loop through the DataFrame `df` row by row, extracting and processing the data in each row.

However, there are some important considerations to keep in mind when using `iterrows()`:

1. **Performance**: `iterrows()` can be relatively slow, especially for large DataFrames. This is because it returns a copy of each row, and you are effectively converting the data into Python objects, which can be inefficient.

2. **Indexing**: Be cautious when modifying data using `iterrows()`. Changes made to the row data within the loop won't affect the original DataFrame unless you explicitly update it. You should generally prefer vectorized operations and apply them to the entire DataFrame to maximize performance.

3. **Alternative Approaches**: In most cases, there are more efficient and pandas-idiomatic ways to achieve the same result as using `iterrows()`. For example, you can use vectorized operations or the `apply()` function to perform operations on DataFrame rows without needing to loop through them.

In summary, while `iterrows()` can be useful for iterating through the rows of a DataFrame, it is generally not the most efficient approach for data analysis tasks. It is recommended to explore alternative methods provided by pandas for better performance and readability in your data analysis workflows.

            

In [None]:
import pandas as pd

data = {
    'Fecha': ['2023-01-15', '2023-01-20', '2023-02-10', '2023-02-12', '2023-03-05', '2023-03-18'],
    'Producto': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Monto': [1000, 500, 800, 600, 1200, 900]
}

df = pd.DataFrame(data)

print(df)

        Fecha Producto  Monto
0  2023-01-15        A   1000
1  2023-01-20        B    500
2  2023-02-10        A    800
3  2023-02-12        B    600
4  2023-03-05        A   1200
5  2023-03-18        B    900


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Fecha     6 non-null      object
 1   Producto  6 non-null      object
 2   Monto     6 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 272.0+ bytes


In [None]:
# convertir fecha a formato date
df['Fecha'] = pd.to_datetime(df['Fecha'])

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     6 non-null      datetime64[ns]
 1   Producto  6 non-null      object        
 2   Monto     6 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 272.0+ bytes


In [None]:
df['Mes'] = df['Fecha'].dt.month

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Fecha     6 non-null      datetime64[ns]
 1   Producto  6 non-null      object        
 2   Monto     6 non-null      int64         
 3   Mes       6 non-null      int64         
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 320.0+ bytes


In [None]:
df



Unnamed: 0,Fecha,Producto,Monto,Mes
0,2023-01-15,A,1000,1
1,2023-01-20,B,500,1
2,2023-02-10,A,800,2
3,2023-02-12,B,600,2
4,2023-03-05,A,1200,3
5,2023-03-18,B,900,3


In [None]:
# inicilaizar un diccionario para calcular los totales
total_de_ventas_por_producto = {}

In [None]:

# Calcular lo totales con iterrows
for index, row in df.iterrows():
  producto =  row["Producto"]
  mes = row["Fecha"].strftime('%Y-%m')
  monto = row["Monto"]

  clave = (producto, mes)

  if clave in total_de_ventas_por_producto:
      total_de_ventas_por_producto[clave] += monto
  else:
      total_de_ventas_por_producto[clave] = monto


In [None]:
print(total_de_ventas_por_producto)

{('A', '2023-01'): 2000, ('B', '2023-01'): 1000, ('A', '2023-02'): 1600, ('B', '2023-02'): 1200, ('A', '2023-03'): 2400, ('B', '2023-03'): 1800}


In [None]:
# Mostrar los totales de ventas por producto y mes

for (producto, mes), total in total_de_ventas_por_producto.items():
  print(f'Producto: {producto}, Mes: {mes} , Total de ventas: {total}')

Producto: A, Mes: 2023-01 , Total de ventas: 1000
Producto: B, Mes: 2023-01 , Total de ventas: 500
Producto: A, Mes: 2023-02 , Total de ventas: 800
Producto: B, Mes: 2023-02 , Total de ventas: 600
Producto: A, Mes: 2023-03 , Total de ventas: 1200
Producto: B, Mes: 2023-03 , Total de ventas: 900


## Take to home

- loops
- diccionarios
-

GLOSARIO

Algunas de las funciones o métodos utilizados aquí.



## strftime

The `strftime` function is used in Python to format dates and times. The format string `'%Y-%m'` represents a specific date format, where:

- `%Y` represents the year with century as a decimal number.
- `-` is a literal hyphen character.
- `%m` represents the month as a zero-padded decimal number.

When you use `strftime('%Y-%m')` with a date, it will format the date in the "YYYY-MM" format, where "YYYY" represents the year with century, and "MM" represents the zero-padded month.

Here's an example of how to use it in Python:

```python
from datetime import datetime

current_date = datetime.now()
formatted_date = current_date.strftime('%Y-%m')
print(formatted_date)
```

This code will print the current date in the "YYYY-MM" format, like "2023-10" for October 2023.