# 1. Introduction

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.




# 2. Download LJSpeech dataset

## 2.1. Download from website

https://keithito.com/LJ-Speech-Dataset/

Metadata is provided in transcripts.csv. This file consists of one record per line, delimited by the pipe character (0x7c). The fields are:

- ID: this is the name of the corresponding .wav file
- Transcription: words spoken by the reader (UTF-8)
- Normalized Transcription: transcription with numbers, ordinals, and monetary units expanded into full words (UTF-8).

Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz. 

Statistics
Total Clips	13,100
Total Words	225,715
Total Characters	1,308,678
Total Duration	23:55:17
Mean Clip Duration 	6.57 sec
Min Clip Duration	1.11 sec
Max Clip Duration	10.10 sec
Mean Words per Clip	17.23
Distinct Words	13,821

In [None]:
# Example

## 2.2. Using Hugging face

**What is Hugging Face?**
Hugging Face is an AI company known for developing user-friendly tools and libraries that make advanced machine learning models accessible and easy to use. Its core offerings include:

    Transformers Library: A popular library that provides state-of-the-art pre-trained models for NLP tasks like text classification, translation, and generation.
    Datasets Library: A collection of datasets for various machine learning tasks, facilitating easy access and management of data.
    Hugging Face Hub: An online platform for sharing and discovering machine learning models and datasets.

Using the `datasets` library from Hugging Face is a great way to work with high-quality speech data. 

We will use the `keithito/lj_speech` dataset from hugging face for LJ Speech dataset. To know the tag name of the the dataset you want on hugging face, you can try to search the name of your dataset.

Here's a step-by-step tutorial to help you get started with this dataset.

**Setup**
Before you start, ensure you have the necessary libraries installed. You'll need the `datasets` library from Hugging Face and potentially other libraries depending on your use case.
```bash
pip install datasets
```

**Loading the Dataset**

First, import the necessary libraries and load the dataset using the datasets library.

```python
from datasets import load_dataset

# Load the dataset
dataset = load_dataset("keithito/lj_speech")
```

The load_dataset function fetches the dataset and returns it in a format that's easy to work with.

**Exploring the Dataset**

Let's take a look at the structure of the dataset to understand what it contains.

```python
# Print dataset information
print(dataset)

# Print a sample from the training set
print(dataset['train'][0])
```

4. Understanding the Data

The dataset typically contains audio files and their corresponding text transcriptions. To get more details about the structure:

5. Loading and Using the Audio Files

If you want to process or listen to the audio files, you can use libraries such as librosa or pydub for audio handling.

import librosa
import matplotlib.pyplot as plt

# Load an audio file
audio_path = example['path']
y, sr = librosa.load(audio_path, sr=None)

# Plot the waveform
plt.figure(figsize=(10, 4))
librosa.display.waveshow(y, sr=sr)
plt.title('Waveform')
plt.show()

6. Save Data