# 10 Advanced Python Tricks for Data Scientists
As a data scientist you spend a lot of time writing Python code for data manipulation, analysis and modeling. While Python is already easy and versatile, knowing a few advanced tricks can supercharge your productivity and make your code cleaner, faster and more efficient.

Josep Ferrer - Analytics Engineer & Technical Writer
[databites.tech](databites-tech)

In [6]:
! pip install pandas_profiling pydantic==1.10.9 numpy==1.25.2 ipywidgets

Collecting ipywidgets
  Downloading ipywidgets-8.1.5-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.8/139.8 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting jupyterlab-widgets~=3.0.12
  Downloading jupyterlab_widgets-3.0.13-py3-none-any.whl (214 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.4/214.4 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
Collecting widgetsnbextension~=4.0.12
  Downloading widgetsnbextension-4.0.13-py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets
Successfully installed ipywidgets-8.1.5 jupyterlab-widgets-3.0.13 widgetsnbextension-4.0.13

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_profiling

## 1. pandas_profiling: Generate a Detailed Report of Your Dataset

In [8]:
# Load dataset
data = pd.read_csv('time_series_dataset/climate_change_data.csv') # Change the path for your local csv file. 

# Generate report
profile = pandas_profiling.ProfileReport(data)
profile.to_notebook_iframe()

Summarize dataset: 100%|██████████| 54/54 [00:03<00:00, 16.13it/s, Completed]                             
Generate report structure: 100%|██████████| 1/1 [00:02<00:00,  2.20s/it]
Render HTML: 100%|██████████| 1/1 [00:00<00:00,  1.62it/s]


## 2. F-Strings: Fast and Clean String Formatting

In [9]:
name = 'Alice'
age = 25
print(f'Hello, my name is {name} and I am {age} years old.')

Hello, my name is Alice and I am 25 years old.


## 3. Lambda Functions: Inline and Anonymous Functions

In [10]:
points = [(1, 2), (3, 1), (5, -1)]
points.sort(key=lambda x: x[1])
print(points)

[(5, -1), (3, 1), (1, 2)]


## 4. zip: Combine Multiple Lists Simultaneously

In [11]:
list1 = ['Data', 'Machine', 'Deep']
list2 = ['Science', 'Learning', 'Learning']
list3 = ['Fundamentals', 'Models', 'Neural Networks']

for word1, word2, word3 in zip(list1, list2, list3):
    print(word1, word2, word3)

Data Science Fundamentals
Machine Learning Models
Deep Learning Neural Networks


## 5. itertools: Advanced Iterations Made Easy

In [12]:
from itertools import combinations
items = ['A', 'B', 'C']
print(list(combinations(items, 2)))

[('A', 'B'), ('A', 'C'), ('B', 'C')]


## 6. NumPy Broadcasting: Eliminate Explicit Loops

In [13]:
import numpy as np

arr = np.array([1, 2, 3])
print(arr + 10)

[11 12 13]


## 7. Matplotlib Subplots: Organize Multiple Visualizations

In [14]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2)

axes[0, 0].plot([1, 2, 3], [1, 4, 9])  # Top-left
axes[0, 1].plot([1, 2, 3], [1, 3, 6])  # Top-right
axes[1, 0].plot([1, 2, 3], [3, 2, 1])  # Bottom-left
axes[1, 1].plot([1, 2, 3], [2, 2, 2])  # Bottom-right

plt.show()

  plt.show()


## 8. apply(): Optimize DataFrame Transformations

In [15]:
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df['sum'] = df.apply(lambda row: row['a'] + row['b'], axis=1)
print(df)

   a  b  sum
0  1  4    5
1  2  5    7
2  3  6    9


## 9. map(): Fast and Clean Data Transformations

In [16]:
numbers = [1, 2, 3, 4]
squares = list(map(lambda x: x**2, numbers))
print(squares)  # Output: [1, 4, 9, 16]


[1, 4, 9, 16]


## 10. Scikit-learn Pipelines: Automate Your Workflow

In [20]:
! pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.6.0-cp310-cp310-macosx_12_0_arm64.whl (11.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.1/11.1 MB[0m [31m56.1 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Collecting threadpoolctl>=3.1.0
  Using cached threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, scikit-learn
Successfully installed scikit-learn-1.6.0 threadpoolctl-3.5.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [21]:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Assuming `data` is the DataFrame shown above
# Drop non-numeric columns for the pipeline
X = data[['Temperature', 'CO2 Emissions', 'Sea Level Rise', 'Precipitation', 'Humidity', 'Wind Speed']]

# Create a target column for demonstration purposes
# Example: Classify if Temperature > 15 as 1, otherwise 0
data['High_Temperature'] = (data['Temperature'] > 15).astype(int)
y = data['High_Temperature']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),        # Scale the features
    ('classifier', LogisticRegression()) # Apply logistic regression
])

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Make predictions
predictions = pipeline.predict(X_test)

print("Predictions:", predictions)


Predictions: [1 1 1 ... 1 0 0]
