### Q1 Data Cleaning: Write a function to handle missing values in a dataset. How would you decide which strategy to use for each column?

In [None]:
import pandas as pd
import numpy as np

# Assuming df is your DataFrame and it contains a column "column_name"
df['column_name'].replace(np.nan, df['column_name'].mean(), inplace=True)


### Q2 Data Manipulation: Given a dataset with a date column, write code to extract features such as month, year, day of the week, and whether the date is a holiday.

In [None]:
df['date'] = pd.to_datetime(df['date'])
df['year'], df['month'], df['day_of_week'] = df['date'].dt.year, df['date'].dt.month, df['date'].dt.dayofweek


### Q3 Machine Learning: Given a dataset, write code to train a binary classification model. Include preprocessing steps, model training, and evaluation.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


### Q4 Deep Learning: Explain how you would implement a convolutional neural network for image classification. Write code to define the architecture of the network using a library such as TensorFlow or PyTorch.

In [None]:
import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

### Q5 Natural Language Processing: Write a function that takes a sentence as input and returns a list of all named entities (people, places, organizations) in the sentence.

In [None]:
import spacy

nlp = spacy.load('en_core_web_sm')
def extract_named_ents(text):
    doc = nlp(text)
    return [(ent.text, ent.label_) for ent in doc.ents]

print(extract_named_ents("Apple is looking at buying U.K. startup for $1 billion"))


### Q6 Time Series Analysis: Given a time series dataset, write code to decompose the series into trend, seasonal, and residual components.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

series = pd.Series([i+np.random.randint(10) for i in range(1,100)])
result = seasonal_decompose(series, model='additive', period=1)

result.plot()

### Q7 SQL: Write a SQL query to find the top 5 customers with the highest total purchase amount in each month.

In [None]:
SELECT customer_id, sum(purchase_amount) as total_purchase, MONTH(purchase_date) as month
GROUP BY month, customer_id
ORDER BY total_purchase DESC
LIMIT 5 

### Q8 Optimization: Implement a gradient descent algorithm from scratch to find the minimum of a simple function (e.g., a quadratic function).

In [None]:
def gradient_descent(x_start, learning_rate, num_iterations):
    x = x_start
    for i in range(num_iterations):
        gradient = 2*x  # derivative of x^2
        x = x - learning_rate * gradient
    return x

print(gradient_descent(10, 0.1, 100))

### Q9 Statistics: Write a function to perform a hypothesis test comparing the means of two groups. The function should take as input two lists of numbers, and output the p-value of the test.

In [None]:
from scipy import stats

def t_test(group1, group2):
    t_stat, p_val = stats.ttest_ind(group1, group2)
    return p_val

print(t_test([1, 2, 3, 4, 5], [1, 2, 3, 4, 5]))

### Q10 Data Structure and Algorithms: Implement a data structure to store and retrieve customer data, with efficient methods for the following operations: insert a new customer, delete a customer, look up a customer by ID, and list all customers in order of their total purchase amount.

In [None]:
class CustomerDatabase:
    def __init__(self):
        self.customers = dict()
    
    def insert(self, id, name, purchase_amount):
        self.customers[id] = (name, purchase_amount)
    
    def delete(self, id):
        del self.customers[id]
    
    def lookup(self, id):
        return self.customers[id]
    
    def list_customers(self):
        return sorted(self.customers.items(), key=lambda x: x[1][1], reverse=True)

### Q11: can you get me 10 different deep learning project examples and its codes in Python tensorflow/keras?

#### Example 1. Image Classification: Use a Convolutional Neural Network (CNN) to classify images in the CIFAR-10 dataset.**bold text**

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])


#### Example 2.  Text Classification: Use a Recurrent Neural Network (RNN) or Transformer to classify movie reviews as positive or negative (sentiment analysis).

In [None]:
model = tf.keras.Sequential([ tf.keras.layers.Embedding(vocab_size, 32), 
                             tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), 
                             tf.keras.layers.Dense(32, activation='relu'), 
                             tf.keras.layers.Dense(1, activation='sigmoid') ])


#### Example 3. Object Detection: Implement a model like YOLO (You Only Look Once) to detect and classify objects within an image.

Implementing a complex model like YOLO from scratch requires significant effort and expertise in computer vision and deep learning. However, I can provide you with an example code that uses the pre-trained YOLO model from the popular library darknet to perform object detection in images. Here's an example of how you can use the darknet library in Python:

First, you need to install the darknet library and download the pre-trained weights for the YOLO model. Instructions for installation can be found in the darknet repository: https://github.com/AlexeyAB/darknet.

Once you have installed darknet, you can use the following Python code to perform object detection using YOLO:

In this example, make sure to replace "yolo.cfg", "yolo.weights", and "coco.data" with the appropriate paths to the YOLO model configuration file, pre-trained weights file, and the dataset configuration file, respectively. Additionally, replace "image.jpg" with the path to the image you want to perform object detection on.

The code loads the pre-trained YOLO model using darknet.load_net and darknet.load_meta. It then loads and resizes the input image. After converting the image to the required format, it performs object detection using darknet.detect_image. The resulting detections are processed, and bounding boxes with class labels are drawn on the image. Finally, the image with the detected objects is displayed.

In [None]:
import cv2
import darknet

# Load the pre-trained YOLO model
net = darknet.load_net(b"yolo.cfg", b"yolo.weights", 0)
meta = darknet.load_meta(b"coco.data")

# Load the image
image_path = "image.jpg"
image = cv2.imread(image_path)

# Resize the image to the required input size of YOLO
resized_image = cv2.resize(image, (darknet.network_width(net), darknet.network_height(net)), interpolation=cv2.INTER_LINEAR)

# Convert the image to the required YOLO format (RGB, float32)
darknet_image = darknet.make_image(darknet.network_width(net), darknet.network_height(net), 3)
darknet.copy_image_from_bytes(darknet_image, resized_image.tobytes())

# Perform object detection
detections = darknet.detect_image(net, meta, darknet_image)

# Process the detections
for detection in detections:
    class_name = detection[0].decode()
    confidence = detection[1]
    x, y, w, h = detection[2]
    x1 = int(x - w / 2)
    y1 = int(y - h / 2)
    x2 = int(x + w / 2)
    y2 = int(y + h / 2)
    
    # Draw bounding box and label on the image
    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(image, f"{class_name} ({confidence:.2f})", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

# Display the image with detections
cv2.imshow("Object Detection", image)
cv2.waitKey(0)
cv2.destroyAllWindows()


#### Example 4. Neural Machine Translation: Build a Sequence-to-Sequence (Seq2Seq) model to translate text from one language to another.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.legacy.data import Field, BucketIterator
from torchtext.legacy.datasets import Multi30k
from torchtext.legacy.data.metrics import bleu_score

# Set random seeds for reproducibility
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True

# Define the fields for source and target languages
SRC = Field(tokenize="spacy", tokenizer_language="en", lower=True, init_token="<sos>", eos_token="<eos>")
TRG = Field(tokenize="spacy", tokenizer_language="de", lower=True, init_token="<sos>", eos_token="<eos>")

# Load and split the Multi30k dataset
train_data, valid_data, test_data = Multi30k.splits(exts=(".en", ".de"), fields=(SRC, TRG))

# Build the vocabulary
SRC.build_vocab(train_data, min_freq=2)
TRG.build_vocab(train_data, min_freq=2)

# Define the Seq2Seq model
class Seq2Seq(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim, num_layers, dropout):
        super(Seq2Seq, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        self.embedding = nn.Embedding(input_dim, hidden_dim)
        self.encoder = nn.LSTM(hidden_dim, hidden_dim, num_layers, dropout=dropout)
        self.decoder = nn.LSTM(hidden_dim, hidden_dim, num_layers, dropout=dropout)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, src, trg):
        embedded = self.dropout(self.embedding(src))
        enc_output, (enc_hidden, enc_cell) = self.encoder(embedded)
        dec_output, _ = self.decoder(enc_output, (enc_hidden, enc_cell))
        output = self.fc(dec_output)
        return output

# Set device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define model hyperparameters
INPUT_DIM = len(SRC.vocab)
OUTPUT_DIM = len(TRG.vocab)
HIDDEN_DIM = 256
NUM_LAYERS = 2
DROPOUT = 0.5
BATCH_SIZE = 128

# Initialize the model
model = Seq2Seq(INPUT_DIM, OUTPUT_DIM, HIDDEN_DIM, NUM_LAYERS, DROPOUT).to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss(ignore_index=TRG.vocab.stoi[TRG.pad_token])
optimizer = optim.Adam(model.parameters())

# Create data iterators
train_iterator, valid_iterator, test_iterator = BucketIterator.splits(
    (train_data, valid_data, test_data),
    batch_size=BATCH_SIZE,
    device=device,
    sort_within_batch=True,
    sort_key=lambda x: len(x.src),
)

# Train the model
def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    for batch in iterator:
        src = batch.src
        trg = batch.trg
        optimizer.zero_grad()
        output = model(src, trg)
        output_dim = output.shape


#### Example 5. Generative Adversarial Network (GAN): Train a GAN to generate new images, like creating new artworks or synthesizing human faces.



In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

# Define the generator model
def build_generator(latent_dim):
    model = models.Sequential()
    model.add(layers.Dense(7 * 7 * 256, input_dim=latent_dim))
    model.add(layers.Reshape((7, 7, 256)))
    model.add(layers.Conv2DTranspose(128, kernel_size=3, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.01))
    model.add(layers.Conv2DTranspose(64, kernel_size=3, strides=1, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.01))
    model.add(layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding='same', activation='sigmoid'))
    return model

# Define the discriminator model
def build_discriminator():
    model = models.Sequential()
    model.add(layers.Conv2D(64, kernel_size=3, strides=2, padding='same', input_shape=(28, 28, 1)))
    model.add(layers.LeakyReLU(alpha=0.01))
    model.add(layers.Conv2D(128, kernel_size=3, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.01))
    model.add(layers.Conv2D(256, kernel_size=3, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.01))
    model.add(layers.Flatten())
    model.add(layers.Dense(1, activation='sigmoid'))
    return model

# Define the GAN model
def build_gan(generator, discriminator):
    discriminator.trainable = False
    model = models.Sequential()
    model.add(generator)
    model.add(discriminator)
    return model

# Define the loss functions
cross_entropy = tf.keras.losses.BinaryCrossentropy()

# Define the discriminator loss function
def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

# Define the generator loss function
def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

# Define the optimizers
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

# Define the training loop
def train_gan(generator, discriminator, gan, dataset, epochs, latent_dim, num_examples=16):
    for epoch in range(epochs):
        for image_batch in dataset:
            # Generate random noise as input to the generator
            noise = tf.random.normal([image_batch.shape[0], latent_dim])

            with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
                # Generate images using the generator
                generated_images = generator(noise, training=True)

                # Compute the discriminator outputs
                real_output = discriminator(image_batch, training=True


#### Example 6.  Autoencoders: Use autoencoders for anomaly detection in credit card transactions or for reducing the dimensionality of data.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report

# Load the credit card transactions dataset
data = pd.read_csv('credit_card_transactions.csv')

# Separate the features and labels
X = data.drop('Class', axis=1)
y = data['Class']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the autoencoder model
input_dim = X_train.shape[1]
encoding_dim = 32

model = models.Sequential()
model.add(layers.Dense(encoding_dim, activation='relu', input_shape=(input_dim,)))
model.add(layers.Dense(input_dim, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the autoencoder
model.fit(X_train, X_train, epochs=10, batch_size=128, validation_data=(X_test, X_test))

# Use the trained autoencoder for anomaly detection
train_predictions = model.predict(X_train)
train_mse = np.mean(np.power(X_train - train_predictions, 2), axis=1)
train_threshold = np.percentile(train_mse, 95)  # Set a threshold to classify anomalies

test_predictions = model.predict(X_test)
test_mse = np.mean(np.power(X_test - test_predictions, 2), axis=1)

# Classify anomalies based on the threshold
y_pred = np.where(test_mse > train_threshold, 1, 0)

# Evaluate the performance
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


#### Example 7. Reinforcement Learning: Implement a Deep Q-Network (DQN) to teach an agent how to play a video game.



In [None]:
import numpy as np
import random
from tensorflow.keras import layers, models

# Tic-Tac-Toe environment
class TicTacToe:
    def __init__(self):
        self.board = np.zeros((3, 3))
        self.current_player = 1  # 1: Agent, -1: Opponent
        self.winner = None
        self.game_over = False

    def reset(self):
        self.board = np.zeros((3, 3))
        self.current_player = 1
        self.winner = None
        self.game_over = False

    def get_valid_moves(self):
        return np.argwhere(self.board == 0)

    def make_move(self, row, col):
        if self.board[row, col] != 0 or self.game_over:
            return False

        self.board[row, col] = self.current_player

        # Check for a winning move
        if self.check_winner():
            self.winner = self.current_player
            self.game_over = True
        # Check for a draw
        elif len(self.get_valid_moves()) == 0:
            self.winner = 0
            self.game_over = True
        else:
            self.current_player *= -1

        return True

    def check_winner(self):
        for player in [1, -1]:
            if np.any(np.all(self.board == player, axis=0)) or np.any(np.all(self.board == player, axis=1)) or \
                    np.all(np.diagonal(self.board) == player) or np.all(np.diagonal(np.fliplr(self.board)) == player):
                return True
        return False

    def print_board(self):
        for row in self.board:
            row_str = ['X' if val == 1 else 'O' if val == -1 else ' ' for val in row]
            print('|'.join(row_str))
            print('-----')
        print()

# Deep Q-Network (DQN) model
class DQN:
    def __init__(self):
        self.model = self.build_model()
        self.target_model = self.build_model()
        self.target_model.set_weights(self.model.get_weights())
        self.epsilon = 1.0
        self.epsilon_decay = 0.999
        self.epsilon_min = 0.01
        self.gamma = 0.99
        self.replay_buffer = []

    def build_model(self):
        model = models.Sequential()
        model.add(layers.Dense(64, input_shape=(9,), activation='relu'))
        model.add(layers.Dense(64, activation='relu'))
        model.add(layers.Dense(9, activation='linear'))
        model.compile(optimizer='adam', loss='mean_squared_error')
        return model

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.choice(np.argwhere(state == 0).flatten())
        else:
            q_values = self.model.predict(state[np.newaxis, :])[0]
            valid_moves = np.argwhere(state == 0).flatten()
            valid_q_values = [q_values[i] for i in valid_moves]
            return valid_moves[np.argmax(valid_q_values)]

    def train(self, batch_size):
        if len(self.replay_buffer) < batch_size:
            return

        batch = random.sample(self.replay_buffer, batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)

        states = np.array(states)
        actions =


#### Example 8. Text Generation: Train a model to generate text, such as completing a sentence or writing poetry.

In [None]:
import tensorflow as tf
import numpy as np

# Define the input text
input_text = """
This is an example text used for training a text generation model. It can be any text that you want to use as input.
"""

# Preprocess the text and create input-output pairs
text = input_text.lower()
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}

input_sequences = []
output_chars = []
sequence_length = 100

for i in range(len(text) - sequence_length):
    input_seq = text[i:i + sequence_length]
    output_char = text[i + sequence_length]
    input_sequences.append([char_to_index[char] for char in input_seq])
    output_chars.append(char_to_index[output_char])

# Convert input sequences and output characters to numpy arrays
X = np.array(input_sequences)
y = np.array(output_chars)

# Define the text generation model
model = tf.keras.models.Sequential([
    tf.keras.layers.Embedding(len(chars), 64, input_length=sequence_length),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(len(chars), activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Train the model
model.fit(X, y, batch_size=128, epochs=50)

# Generate text using the trained model
start_text = "This is"
num_chars = 200

generated_text = start_text.lower()

for _ in range(num_chars):
    input_seq = [char_to_index[char] for char in generated_text[-sequence_length:]]
    input_seq = np.array(input_seq)[np.newaxis, :]
    predicted = model.predict(input_seq)
    next_index = np.argmax(predicted)
    next_char = index_to_char[next_index]
    generated_text += next_char

print(generated_text)


### Q12: What does data science field deal with? like profit optimization?

The field of data science deals with extracting insights and knowledge from large and complex datasets to drive informed decision-making. It encompasses various disciplines such as statistics, mathematics, computer science, and domain expertise to analyze and interpret data.

Profit optimization is indeed one of the areas where data science can be applied. In this context, data scientists may employ statistical and machine learning techniques to analyze historical sales data, market trends, customer behavior, and other relevant factors. By understanding the relationships between different variables and identifying patterns, data scientists can develop models and algorithms to optimize business strategies and maximize profits.

Specifically, in profit optimization, data science can help with:

**[1] Demand forecasting**: By analyzing historical sales data and external factors (e.g., weather, economic indicators), data scientists can build models to forecast future demand. This information can guide decision-making related to production, inventory management, and pricing.

**[2] Pricing optimization**: Data scientists can leverage pricing data, customer preferences, competitor information, and market dynamics to develop pricing models that maximize profitability. They may use techniques like price elasticity analysis and A/B testing to determine optimal price points.

**[3] Customer segmentation and targeting**: Data science enables businesses to identify different customer segments based on various attributes such as demographics, purchasing behavior, and preferences. This segmentation helps in targeted marketing campaigns and personalized recommendations, thereby increasing the probability of generating higher profits.

**[4] Churn prediction and customer retention**: By analyzing customer data and usage patterns, data scientists can build models to predict customer churn or attrition. This information allows businesses to take proactive measures to retain customers by offering personalized incentives, targeted marketing, or improving customer experience.

**[5] Optimization of marketing campaigns**: Data science techniques, such as predictive analytics and machine learning algorithms, can be used to optimize marketing campaigns. By analyzing data on customer responses, conversion rates, and marketing channels, data scientists can identify the most effective strategies and allocate marketing budgets more efficiently.

These are just a few examples of how data science can contribute to profit optimization. The field is vast, and its applications span across industries, including finance, healthcare, retail, manufacturing, and many others.

##### 12-[1] Demand forecasting: 
By analyzing historical sales data and external factors (e.g., weather, economic indicators), data scientists can build models to forecast future demand. This information can guide decision-making related to production, inventory management, and pricing.

In this example, you would need to have three CSV files: one containing historical sales data (sales_data.csv), one with weather data (weather_data.csv), and another with economic indicators data (economic_data.csv). The sales data should include a date column and a sales column.

The code loads the data, preprocesses it, merges the exogenous variables (weather and economic indicators) with the sales data, splits the data into training and testing sets, builds a SARIMAX forecasting model, and generates future demand forecasts. Finally, it evaluates the accuracy of the forecast using the mean absolute percentage error (MAPE).

Keep in mind that this is just a basic example, and in practice, you would need to explore more sophisticated models, handle missing data, handle seasonality and trends, perform feature engineering, and conduct thorough model evaluation and validation.

In [None]:
import pandas as pd
import statsmodels.api as sm

# Load the historical sales data
sales_data = pd.read_csv('sales_data.csv')

# Preprocess the data
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
sales_data = sales_data.set_index('Date')

# Resample the data to aggregate by a specific time period (e.g., monthly)
monthly_sales = sales_data['Sales'].resample('M').sum()

# Prepare the exogenous variables (e.g., weather, economic indicators)
weather_data = pd.read_csv('weather_data.csv')
economic_data = pd.read_csv('economic_data.csv')

# Merge exogenous variables with the sales data
merged_data = monthly_sales.to_frame().join(weather_data.set_index('Date')).join(economic_data.set_index('Date'))

# Split the data into training and testing sets
train_data = merged_data.loc[:'2022-12-31']
test_data = merged_data.loc['2023-01-01':]

# Build and train the forecasting model (e.g., SARIMAX)
model = sm.tsa.SARIMAX(train_data['Sales'], exog=train_data[['Weather', 'EconomicIndicators']], order=(1, 0, 0))
results = model.fit()

# Forecast future demand
forecast = results.get_forecast(steps=len(test_data), exog=test_data[['Weather', 'EconomicIndicators']])
forecasted_values = forecast.predicted_mean

# Evaluate the accuracy of the forecast (e.g., using mean absolute percentage error)
mape = (abs(forecasted_values - test_data['Sales']) / test_data['Sales']).mean() * 100
print('Mean Absolute Percentage Error (MAPE): {:.2f}%'.format(mape))


##### 12-[3] **Pricing optimization** : Data scientists can leverage pricing data, customer preferences, competitor information, and market dynamics to develop pricing models that maximize profitability. They may use techniques like price elasticity analysis and A/B testing to determine optimal price points."
 
demonstrates how you can use linear regression to perform price elasticity analysis and optimize pricing based on customer preferences and market dynamics. Please note that this is a basic example, and in practice, you may need to consider additional factors and use more sophisticated models.

In this example, you would need two CSV files: one containing the pricing data (pricing_data.csv) and another with the corresponding demand data (demand_data.csv). The pricing data should include a column for the product price, customer preferences, competitor prices, and other relevant features. The demand data should have the corresponding demand or sales figures for each product.

The code loads the pricing and demand data, merges them based on the product, and prepares the input features (X) and target variable (y) for the linear regression model. It fits a linear regression model and obtains the coefficients (which represent the impact of each feature on demand). Then, it calculates the price elasticity by dividing the negative coefficient of price (coefficients[0]) by the demand. Finally, it identifies the price with the maximum price elasticity, which represents the optimal price point for maximizing profitability.

Keep in mind that this is a basic example, and in practice, you may need to handle additional features, perform feature engineering, handle multicollinearity, evaluate model performance, and consider other factors such as market segmentation, seasonality, and competitor strategies.

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

# Load the pricing and demand data
pricing_data = pd.read_csv('pricing_data.csv')
demand_data = pd.read_csv('demand_data.csv')

# Merge the pricing and demand data
merged_data = pricing_data.merge(demand_data, on='Product')

# Prepare the input features (price, customer preferences, competitor information, etc.)
X = merged_data[['Price', 'CustomerPreference', 'CompetitorPrice']]
y = merged_data['Demand']

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)

# Obtain the coefficients of the model
coefficients = model.coef_
intercept = model.intercept_

# Calculate price elasticity
price_elasticity = -coefficients[0] * merged_data['Price'] / merged_data['Demand']

# Optimize pricing by finding the maximum price elasticity
optimal_price = merged_data['Price'].loc[price_elasticity.idxmax()]

print('Optimal Price: {:.2f}'.format(optimal_price))
