# üìäüß† **Data Mastery: First Steps in Data Handling**
*Course: AI/DS Nexus by Reza Shokrzad*

Welcome to your **first hands-on notebook** for mastering data across modalities! This notebook walks you through:
- üßæ Reading and exploring tabular `.csv` files
- üìö Working with raw and structured **text data**
- üñºÔ∏è Loading and transforming **image data**
- üîä Processing **audio signals**
- üåê Using real datasets and corpora from popular libraries like **NLTK**, **TorchVision**, and **Librosa**

Let's get started on your journey to become a data-savvy practitioner! üöÄ

## üßæ Section 1: Working with Tabular Data (.csv)


In [None]:
import pandas as pd

# 1. Read the CSV file
df = pd.read_csv("your_file.csv")

# 2. Show the first 5 rows
print(df.head())

# 3. Check the shape (rows, columns)
print("Shape:", df.shape)

# 4. Show column names
print("Columns:", df.columns.tolist())

# 5. Check data types of each column
print(df.dtypes)

# 6. Get basic summary statistics
print(df.describe())

# 7. Check for missing values
print(df.isnull().sum())

# 8. See unique values in a specific column (e.g., "Category")
print(df["Category"].unique())  # Change "Category" to your actual column name

# 9. Quick info summary
df.info()

## üìö Section 2: Working with Text and Strings



In [None]:
text = "Hello, Python learners!"
print(text)
print(type(text))  # <class 'str'>
print(len(text))  # 23
print(text[0])     # H
print(text[-1])    # !
print(text[0:5])   # Hello
print(text.lower())
print(text.upper())
print(text.split())
print(text.replace("Python", "World"))

name = "Reza"
message = f"Welcome, {name}!"
print(message)

paragraph = """Data science is fun. Python makes it easier. Let's learn together!"""
sentences = paragraph.split('. ')
print("Number of sentences:", len(sentences))
words = paragraph.split()
print("Number of words:", len(words))
print("Unique words:", set(words))


## üìñ Section 3: Using NLTK for Text Corpus


In [None]:
# !pip install nltk
import nltk
nltk.download('gutenberg')
nltk.download('punkt')

from nltk.corpus import gutenberg
print(gutenberg.fileids())
text = gutenberg.raw('carroll-alice.txt')
print(text[:500])

from nltk.tokenize import word_tokenize
tokens = word_tokenize(text)
print("Number of words:", len(tokens))

from nltk.probability import FreqDist
fdist = FreqDist(tokens)
print(fdist.most_common(10))


## üñºÔ∏è Section 4: Image Data Loading & Display


In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

image = Image.open("your_image.jpg")
plt.imshow(image)
plt.axis('off')
plt.title("Loaded Image")
plt.show()

image_array = np.array(image)
print("Image shape:", image_array.shape)


## üì¶ Section 5: TorchVision + CIFAR-10 Images


In [None]:
# !pip install torchvision
import torchvision
import torchvision.transforms as T

transform = T.ToTensor()
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

image, label = trainset[0]
print("Label:", label)
print("Image shape:", image.shape)

plt.imshow(image.permute(1, 2, 0))
plt.title(f"Label: {label}")
plt.axis('off')
plt.show()

print("Classes:", trainset.classes)


## üîä Section 6: Audio Data with Librosa

In [None]:
# !pip install librosa
import librosa
import librosa.display

audio_path = "your_audio.wav"
y, sr = librosa.load(audio_path)
print("Audio signal shape:", y.shape)
print("Sampling rate:", sr)
print("Duration (s):", librosa.get_duration(y=y, sr=sr))

plt.figure(figsize=(10, 4))
librosa.display.waveshow(y, sr=sr)
plt.title("Waveform")
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.tight_layout()
plt.show()


## üéôÔ∏è Section 7: Audio with TorchAudio

In [None]:
# !pip install torchaudio
import torchaudio
import torchaudio.transforms as T

dataset = torchaudio.datasets.SPEECHCOMMANDS("./data", download=True)
waveform, sample_rate, label, *_ = dataset[0]

print("Label:", label)
print("Waveform shape:", waveform.shape)
print("Sample rate:", sample_rate)

plt.figure(figsize=(10, 3))
plt.plot(waveform.t().numpy())
plt.title(f"Label: {label}")
plt.xlabel("Time")
plt.ylabel("Amplitude")
plt.tight_layout()
plt.show()
