<a href="https://colab.research.google.com/github/shrishtinigam/intermediate-python/blob/main/02_Efficient_Code_In_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## List Comprehensions VS Looping
https://stackoverflow.com/questions/22108488/are-list-comprehensions-and-functional-functions-faster-than-for-loops

The following are rough guidelines and educated guesses based on experience. You should time or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn't have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop.

Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren't magic that is inherently faster than a good old loop.

In [None]:
import time

In [None]:
list_comp_start_time = time.time()
result = [i*i for i in range(0,100000)]
list_comp_end_time = time.time()
print("Time using list comprehension: " + str(list_comp_end_time - list_comp_start_time))

Time using list comprehension: 0.02050042152404785


In [None]:
for_loop_start_time = time.time()
for i in range(0,100000):
    result.append(i*i)
for_loop_end_time = time.time()
print("Time using list comprehension: " +str(for_loop_end_time - for_loop_start_time))

Time using list comprehension: 0.02550482749938965


## Appending Rows to a Pandas Dataframe
Adding rows to a pandas DataFrame can be necessary in many scenarios, but it is generally not considered a best practice if done repeatedly in a loop or iterative process. This is because DataFrames are not optimized for frequent appends, which can lead to inefficient memory usage and slow performance.

Appending rows to a pandas DataFrame using the append method is inefficient and deprecated since pandas version 1.4.0. The recommended approach is to use the pd.concat function.

### Using `pd.concat`
* Basic Usage: pd.concat takes a list of DataFrames or Series to concatenate.
* Axis Parameter: Default concatenation is along the rows (axis 0). To concatenate columns, use axis=1.
* Ignore Index: Use ignore_index=True to reindex the resulting DataFrame, useful when the indices overlap.
* Batch Concatenation: Collect DataFrames or rows in a list and concatenate them once at the end. This approach is much more efficient than concatenating within a loop.

In [None]:
import pandas

ModuleNotFoundError: No module named 'pandas'

In [None]:
data = []
for y in range(1999, 2023):
    df_y = pd.DataFrame(np.random.rand(365, 3), index=pd.date_range(f'{y}-01-01', periods=365), columns=list('ABC'))
    data.append(df_y)
df_all = pd.concat(data)

NameError: name 'pd' is not defined