<div style="  background: linear-gradient(145deg, #0f172a, #1e293b);  border: 4px solid transparent;  border-radius: 14px;  padding: 18px 22px;  margin: 12px 0;  font-size: 26px;  font-weight: 600;  color: #f8fafc;  box-shadow: 0 6px 14px rgba(0,0,0,0.25);  background-clip: padding-box;  position: relative;">  <div style="    position: absolute;    inset: 0;    padding: 4px;    border-radius: 14px;    background: linear-gradient(90deg, #06b6d4, #3b82f6, #8b5cf6);    -webkit-mask:       linear-gradient(#fff 0 0) content-box,       linear-gradient(#fff 0 0);    -webkit-mask-composite: xor;    mask-composite: exclude;    pointer-events: none;  "></div>    <b>Introduction to Audio Data in Python</b>    <br/>  <span style="color:#9ca3af; font-size: 18px; font-weight: 400;">(Spoken Language Processing in Python)</span></div>

## Table of Contents

1. [Dealing with Audio Files in Python](#section-1)
2. [Frequency Examples & Importing Wave](#section-2)
3. [Opening an Audio File](#section-3)
4. [Converting Sound Wave Bytes to Integers](#section-4)
5. [Finding the Frame Rate](#section-5)
6. [Finding Sound Wave Timestamps](#section-6)
7. [Visualizing Sound Waves](#section-7)
8. [Conclusion](#conclusion)

***

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip: Setup Required</b> <br>To make this notebook fully executable, we will first generate dummy audio files (`good-morning.wav` and `good-afternoon.wav`) in the code block below. In a real-world scenario, you would load your own existing .wav files.</div>



In [None]:
# PREREQUISITE: GENERATING DUMMY DATA FOR THIS NOTEBOOK
# Run this cell first to create the audio files used in the examples.

import wave
import numpy as np
import struct

def create_dummy_wav(filename, freq=440, duration=2, framerate=48000):
    # Generate a sine wave
    t = np.linspace(0, duration, int(framerate * duration))
    amplitude = 10000 # Amplitude of the wave
    audio_data = (np.sin(2 * np.pi * freq * t) * amplitude).astype(np.int16)
    
    with wave.open(filename, "w") as f:
        f.setnchannels(1)      # Mono
        f.setsampwidth(2)      # 2 bytes (16-bit)
        f.setframerate(framerate)
        f.writeframes(audio_data.tobytes())
    print(f"Created {filename}")

# Create the files expected by the tutorial
create_dummy_wav("good-morning.wav", freq=440) # A4 Note
create_dummy_wav("good-afternoon.wav", freq=880) # A5 Note



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 1. Dealing with Audio Files in Python</span><br>

When working with spoken language processing in Python, the first step is understanding the nature of the data. Audio files come in various formats, and digital sound is measured in specific ways that determine the quality and size of the data.

### Common Audio File Formats
There are several different kinds of audio files you might encounter:

| Extension | Description |
| :--- | :--- |
| **.mp3** | Compressed audio format, very common for music. |
| **.wav** | Uncompressed audio format, high quality, standard for processing. |
| **.m4a** | MPEG-4 Audio, often used by Apple devices. |
| **.flac** | Free Lossless Audio Codec, high quality compressed format. |

### Understanding Frequency
Digital sounds are measured in **frequency**, typically expressed in **kHz** (kilohertz).

*   **1 kHz** = 1,000 pieces of information per second.
*   Higher frequency generally means higher audio quality but results in larger file sizes.

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 2. Frequency Examples & Importing Wave</span><br>

Different types of audio require different frequencies depending on the complexity of the sound.

### Typical Frequency Ranges

| Audio Type | Frequency (Sample Rate) |
| :--- | :--- |
| **Streaming Songs** | ~32 kHz (or 44.1 kHz standard) |
| **Audiobooks / Spoken Language** | Between 8 and 16 kHz |

Since we cannot "see" audio files directly like we do with images or text, we must transform them into a numerical format that Python can process. The standard library for this in Python is `wave`.

### Importing the Library


In [None]:
# Standard Python library for working with wav files
import wave



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 3. Opening an Audio File</span><br>

In this section, we will open an audio file named `good-morning.wav`. We treat audio files similarly to text files, opening them in read mode (`"r"`).

### The Process
1.  Import the audio file as a `wave` object.
2.  Read the frames from the object.
3.  The output is initially in **bytes**, which is not immediately human-readable.

#### Original Code Concept


In [None]:
# Import audio file as wave object
good_morning = wave.open("good-morning.wav", "r")

# Convert wave object to bytes
# readframes(-1) reads all frames in the file
good_morning_soundwave = good_morning.readframes(-1)

# View the wav file in byte form
print(good_morning_soundwave[:20]) # Printing first 20 bytes to avoid clutter



#### Enhanced Executable Code


In [None]:
import wave

# 1. Open the wave file
# We use the file we generated in the Setup step
good_morning = wave.open("good-morning.wav", "r")

# 2. Read all frames as bytes
good_morning_soundwave = good_morning.readframes(-1)

# 3. Inspect the raw byte data
print(f"Type of data: {type(good_morning_soundwave)}")
print(f"First 50 bytes: {good_morning_soundwave[:50]}")



<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip:</b> Working with audio is different from other data types. A very small sample of audio (even a few seconds) contains a large amount of information. The raw byte output (e.g., `b'\xfd\xff...`) must be converted to be useful. </div>

<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 4. Converting Sound Wave Bytes to Integers</span><br>

Raw bytes are difficult to manipulate mathematically. To perform analysis or visualization, we must convert these bytes into integers. We use the **NumPy** library for this efficient conversion.

### Conversion Steps
1.  Import `numpy`.
2.  Use `np.frombuffer()` to convert the byte string into a NumPy array.
3.  Specify the data type (`dtype`). For standard `.wav` files, this is often `int16` (16-bit integers).

#### Code Implementation


In [None]:
import numpy as np

# Convert soundwave_gm from bytes to integers
# We use 'int16' because standard wav audio is often 16-bit depth
signal_gm = np.frombuffer(good_morning_soundwave, dtype='int16')

# Show the first 10 items
print("First 10 integer values of the signal:")
print(signal_gm[:10])

# Check the shape to see how many data points we have
print(f"\nTotal data points: {len(signal_gm)}")



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 5. Finding the Frame Rate</span><br>

The **Frame Rate** (or Sample Rate) tells us how many data points exist per second of audio. This is crucial for calculating the duration of the audio file.

### Key Formulas
1.  **Frequency (Hz)** = `wave_object.getframerate()`
2.  **Duration (Seconds)** = $\frac{\text{Length of wave object array}}{\text{Frequency (Hz)}}$

#### Code Implementation


In [None]:
# Get the frame rate from the wave object
framerate_gm = good_morning.getframerate()

# Show the frame rate
print(f"Frame Rate: {framerate_gm} Hz")

# Calculate duration
# We use the length of the numpy array (signal_gm) divided by the framerate
duration = len(signal_gm) / framerate_gm
print(f"Duration: {duration} seconds")



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 6. Finding Sound Wave Timestamps</span><br>

To visualize audio, we need an X-axis representing **Time**. Since we only have the signal amplitude (Y-axis), we must generate the timestamps mathematically.

We use `np.linspace()` to generate evenly spaced values.

### Understanding `np.linspace`
`np.linspace(start, stop, num)` returns `num` evenly spaced samples, calculated over the interval `[start, stop]`.

#### Code Implementation


In [None]:
# Example of linspace logic
print("Linspace Example (1 to 10):")
print(np.linspace(start=1, stop=10, num=10))

# Get the timestamps of the good morning sound wave
# Start: 0 seconds
# Stop: Duration of the file (len(signal) / framerate)
# Num: Total number of data points (len(signal))

time_gm = np.linspace(start=0,
                      stop=len(signal_gm) / framerate_gm,
                      num=len(signal_gm))

# View first 10 timestamps
print("\nFirst 10 timestamps (seconds):")
print(time_gm[:10])



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 7. Visualizing Sound Waves</span><br>

Visualizing audio data allows us to compare different sound waves. In this example, we will compare "Good Morning" with "Good Afternoon".

### Preparation
We need to process the second file (`good-afternoon.wav`) exactly the same way we processed the first one.

#### Step 1: Process the Second File


In [None]:
# Open the second file
good_afternoon = wave.open("good-afternoon.wav", "r")

# Read frames and convert to integers
soundwave_ga = good_afternoon.readframes(-1)
signal_ga = np.frombuffer(soundwave_ga, dtype='int16')

# Get framerate
framerate_ga = good_afternoon.getframerate()

# Create timestamps
time_ga = np.linspace(start=0,
                      stop=len(signal_ga) / framerate_ga,
                      num=len(signal_ga))

print(f"Processed Good Afternoon: {len(signal_ga)} samples at {framerate_ga} Hz")



#### Step 2: Plotting with Matplotlib
We use `matplotlib.pyplot` to create the visualization. We will plot Time (x-axis) vs Amplitude (y-axis).



In [None]:
import matplotlib.pyplot as plt

# Initialize figure and setup title
plt.figure(figsize=(10, 6)) # Added figsize for better visibility in notebook
plt.title("Good Afternoon vs. Good Morning")

# x and y axis labels
plt.xlabel("Time (seconds)")
plt.ylabel("Amplitude")

# Add good morning and good afternoon values
# We use alpha=0.5 to make the plots semi-transparent so we can see overlap
plt.plot(time_ga, signal_ga, label="Good Afternoon")
plt.plot(time_gm, signal_gm, label="Good Morning", alpha=0.5)

# Create a legend and show our plot
plt.legend()
plt.show()



<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 8. Conclusion</span><br>

In this notebook, we explored the fundamentals of processing audio data in Python. We covered the entire pipeline from raw file to visualization:

1.  **Understanding Formats:** We learned about `.wav` files and frequency (kHz).
2.  **Loading Data:** We used the `wave` library to read raw bytes.
3.  **Data Conversion:** We used `numpy` to convert raw bytes into usable integers (`int16`).
4.  **Time Domain:** We calculated the duration and generated timestamps using `np.linspace`.
5.  **Visualization:** We used `matplotlib` to plot the sound waves, allowing for visual comparison of audio signals.

**Next Steps:**
*   Try applying Fast Fourier Transforms (FFT) to analyze the frequency content.
*   Explore the `librosa` library, which abstracts many of these steps for easier audio processing.
*   Use these techniques to prepare audio data for machine learning models (e.g., speech recognition).
