# Functions and Libraries

In programming, a function is a self-contained block of code that encapsulates a specific task or related group of tasks. Functions enhance readability, foster code reuse, and allow for more straightforward debugging. Within the context of a data science project, such as animal behavior classification, functions play a pivotal role in organizing the steps of the data processing and modeling workflow.

## Simulating Data and Saving to CSV

Often in data science, before obtaining real-world data or to test a new method, we might need to create simulated data. Simulated data can provide an understanding of how algorithms might perform or help in setting up the workflow.

Let's begin by creating a function that simulates a dataset for animal behavior:

In [7]:
import random
import csv

def simulate_data(num_samples=1000):
    """Simulate a dataset for animal behavior."""
    dataset = []
    for _ in range(num_samples):
        sample = {
            'activity_duration': random.uniform(0, 100),
            'proximity_to_others': random.uniform(0, 10),
            'noise_levels': random.uniform(50, 100)
        }
        dataset.append(sample)

    # Save the dataset to a CSV
    filename = "simulated_dataset.csv"
    keys = dataset[0].keys()
    with open(filename, 'w', newline='') as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(dataset)
    
    return f"Data simulated and saved to {filename}"

# Use the function to generate simulated data
print(simulate_data())

Data simulated and saved to simulated_dataset.csv


## Data Extraction Function
Once our data is saved, the next logical step in our workflow is to retrieve and preprocess it. This retrieval can be effectively done using a function:

In [8]:
import pandas as pd

def extract_and_preprocess(filename="simulated_dataset.csv"):
    """Extract data from the provided file and preprocess it."""
    
    # Load data from CSV
    data = pd.read_csv(filename)
    
    # Extract features
    features = data[['activity_duration', 'proximity_to_others', 'noise_levels']]
    
    # Here, you could add more preprocessing steps as per the project's requirements.
    # For demonstration, we're simply normalizing the features.
    features_normalized = (features - features.mean()) / features.std()
    
    return features_normalized

# Use the function to extract and preprocess data
processed_data = extract_and_preprocess()
print(processed_data.head())

   activity_duration  proximity_to_others  noise_levels
0          -0.750392            -1.557333      0.626784
1          -1.708669            -0.529892      1.078318
2          -0.725235             0.147800     -1.442999
3           1.559831            -0.196572      0.482220
4           0.940615             0.324937     -1.220871


# Classes in Python
In object-oriented programming, a class is a blueprint for creating objects. Classes encapsulate data and provide methods to interact with that data. While this section won't delve into the profound intricacies of object-oriented programming, it's essential to have a foundational understanding of classes as we'll encounter them later, particularly when we explore PyTorch in the deep learning chapter.

Let's demonstrate the concept of classes using our previous functions.

In [11]:
import pandas as pd

class AnimalBehaviorData:
    def __init__(self, filename="simulated_dataset.csv"):
        self.filename = filename
        self.data = None
        self.processed_data = None
        
    def extract_and_preprocess(self):
        self.data = pd.read_csv(self.filename)
        features = self.data[['activity_duration', 'proximity_to_others', 'noise_levels']]
        self.processed_data = (features - features.mean()) / features.std()
        
    def display_data(self):
        print(self.processed_data.head())

# Create an instance of the class and use its methods
dataset = AnimalBehaviorData()
dataset.extract_and_preprocess()
dataset.display_data()

   activity_duration  proximity_to_others  noise_levels
0          -0.750392            -1.557333      0.626784
1          -1.708669            -0.529892      1.078318
2          -0.725235             0.147800     -1.442999
3           1.559831            -0.196572      0.482220
4           0.940615             0.324937     -1.220871
