In [None]:
#1. What are the key tasks involved in getting ready to work with machine learning modeling?

"""Getting ready to work with machine learning modeling involves several key tasks to ensure a successful and 
   effective implementation. Here are the main steps you should consider:

   1. Problem Definition**: Clearly define the problem you want to solve with machine learning. Understand the 
      business objectives, the expected outcome, and how the model will be used to address the problem.

   2. Data Collection**: Gather relevant data that will be used to train and evaluate the model. Ensure that 
      the data is of high quality, sufficient in quantity, and representative of the problem domain.

   3. Data Preprocessing**: Clean the data, handle missing values, and remove any noise or outliers that might 
      adversely affect the model's performance. This step is crucial for creating a reliable and accurate model.

   4. Data Exploration and Analysis**: Explore the data to gain insights, understand the relationships between 
      variables, and identify patterns that might influence the model's behavior.

   5. Feature Engineering**: Select or create the most relevant features (input variables) that will be used to
      train the model. Feature engineering can significantly impact the model's performance.

   6. Data Splitting**: Divide the data into training, validation, and testing sets. The training set is used to 
      train the model, the validation set is used for hyperparameter tuning, and the testing set is used to evaluate 
      the final model's performance.

   7. Model Selection**: Choose the appropriate machine learning algorithm or model architecture based on the problem 
      type (classification, regression, clustering, etc.) and the characteristics of the data.

   8. Model Training**: Train the selected model using the training data. This involves adjusting the model's parameters 
      to minimize the error or loss function.

   9. Hyperparameter Tuning**: Fine-tune the model's hyperparameters using the validation set to optimize its performance.
      This process may involve techniques like grid search or random search.

   10. Model Evaluation**: Evaluate the model's performance on the testing set using appropriate evaluation metrics. 
       This step helps to determine how well the model generalizes to new, unseen data.

   11. Model Optimization**: If the model's performance is not satisfactory, consider optimizing it by revisiting the
       data, features, or hyperparameters.

   12. Deployment**: Once you have a satisfactory model, deploy it to the production environment to make predictions on
       new incoming data.

   13. Monitoring and Maintenance**: Continuously monitor the model's performance in the production environment and 
      update it as needed. Machine learning models may require periodic retraining to adapt to changes in the data 
      distribution.

   14. Documentation**: Properly document the entire process, including data sources, preprocessing steps, model
       architecture, hyperparameters, evaluation metrics, and any other relevant information. This documentation
       is essential for reproducibility and collaboration."""

#2. What are the different forms of data used in machine learning? Give a specific example for each of them.

"""In machine learning, data can take various forms, and the type of data used depends on the problem and the nature
   of the variables being modeled. The main forms of data used in machine learning are:

   1. Numerical Data**:
      Numerical data consists of numerical values that can be either continuous or discrete. Continuous numerical
      data can take any value within a range, while discrete numerical data only takes specific, separate values.

   Example: Housing Price Prediction
   In a housing price prediction problem, the features (input variables) might include numerical data such as the
   area of the house (continuous), the number of bedrooms (discrete), and the age of the house (continuous).

   2. Categorical Data:
      Categorical data represents characteristics or attributes that belong to a specific category or class.
      These categories are typically represented by labels or strings.

    Example: Customer Segmentation
    In a customer segmentation problem, the features might include categorical data such as the type of customer 
    (e.g., regular, premium, new), the preferred payment method (e.g., credit card, cash), and the location of the 
    customer (e.g., city A, city B).

   3. Text Data:
      Text data consists of unstructured text information, such as sentences, paragraphs, or documents. 
      Natural Language Processing (NLP) techniques are often used to process and analyze text data.

   Example: Sentiment Analysis
   In sentiment analysis, the input data could be a collection of customer reviews for a product. The model aims 
   to classify each review as positive, negative, or neutral based on the sentiment expressed in the text.

   4. Image Data:
      Image data consists of pixel values representing visual information. It is commonly used in computer vision tasks,
      such as image classification or object detection.

    Example: Handwritten Digit Recognition
    In a handwritten digit recognition problem, the data consists of images of handwritten digits (0-9). The model's
    task is to correctly identify the digit present in each image.

   5. Audio Data:
      Audio data represents sound signals, often used in speech recognition or audio classification tasks.

    Example: Speech Recognition
     In speech recognition, the data is audio recordings of spoken words or phrases. The model is trained to 
     transcribe the spoken words into written text.

   6. Time Series Data:
      Time series data is a sequence of data points collected over time at regular intervals. This data type is 
      prevalent in forecasting and trend analysis tasks.

    Example: Stock Price Prediction
    In stock price prediction, the input data is a time series of historical stock prices. The model is trained to
    forecast future stock prices based on past patterns and trends.

  Each of these data forms requires specific preprocessing and modeling techniques tailored to their characteristics 
  and the objectives of the machine learning task at hand."""

#3. Distinguish:

# 1. Numeric vs. categorical attributes

"""Numeric and categorical attributes are two fundamental types of data used in machine learning, and they have
   distinct characteristics and properties. Let's distinguish between them:

   1. Numeric Attributes:

      • Definition: Numeric attributes consist of numerical values that can be either continuous or discrete. 
        Continuous numeric attributes can take any value within a range, while discrete numeric attributes only 
        take specific, separate values.

      • Examples: Age, temperature, height, weight, income, and any other measurable quantity are examples of numeric 
        attributes.

      • Characteristics: Numeric attributes can be used for mathematical operations, such as addition, subtraction,
        multiplication, and division. They have a natural ordering due to their numerical nature, which allows for 
        the calculation of means, medians, and other statistical measures.

      • Representation: Numeric attributes are represented as numbers, and they can be encoded as real numbers
        (e.g., floating-point values) or integers.

      • Usage: Numeric attributes are commonly used in regression tasks, where the goal is to predict a continuous 
        numerical output, such as predicting the price of a house or the temperature for a given date.
        
  2. Categorical Attributes:

     • Definition: Categorical attributes represent characteristics or attributes that belong to a specific category 
       or class. They cannot be mathematically operated on in the same way as numeric attributes.

     • Examples: Gender (e.g., male, female), color (e.g., red, blue, green), product categories (e.g., electronics,
       clothing, books), and any other discrete labels are examples of categorical attributes.

     • Characteristics: Categorical attributes have distinct categories, and they often do not have a natural ordering
       between them (e.g., red is not greater than blue). However, they can be encoded as numerical values through 
       techniques like one-hot encoding.

     • Representation: Categorical attributes are represented using labels or strings, and they can be encoded into
       numerical form using techniques like one-hot encoding or label encoding.
 
     • Usage: Categorical attributes are commonly used in classification tasks, where the goal is to predict a class
       label for a given input. For instance, classifying emails as "spam" or "not spam" or predicting the type of 
       flower species based on its features.

  In summary, numeric attributes are used to represent numerical quantities and support mathematical operations, while 
  categorical attributes represent discrete categories or classes and require specific encoding techniques for use in
  machine learning models. Understanding the distinction between these two types of attributes is essential for preprocessing
  data and selecting appropriate modeling techniques."""

#2. Feature selection vs. dimensionality reduction

"""Feature selection and dimensionality reduction are techniques used in machine learning to handle the high dimensionality
   of data and improve model performance. While they both aim to reduce the number of features used for modeling, they 
   have distinct purposes and approaches:

   1. Feature Selection:

      • Definition: Feature selection is the process of selecting a subset of the most relevant and informative features
        from the original set of features. The objective is to retain only the most important features while discarding 
        irrelevant or redundant ones.

      • Purpose: The primary purpose of feature selection is to simplify the model and improve its efficiency by reducing 
        the computational complexity and training time. It also helps in mitigating the risk of overfitting, as using fewer
        features reduces the chances of the model learning noise or irrelevant patterns in the data.

      • Techniques: Feature selection techniques include methods like Univariate Feature Selection (e.g., SelectKBest, 
        SelectPercentile), Recursive Feature Elimination (RFE), and feature importance from tree-based models.

      • Example: In a dataset with 100 features, feature selection may identify and keep only the 20 most relevant 
        features, discarding the remaining 80.
        
   2. Dimensionality Reduction:

      • Definition: Dimensionality reduction is the process of transforming the original high-dimensional feature space
        into a lower-dimensional space while preserving the most important information and structure of the data.

      • Purpose: The main purpose of dimensionality reduction is to handle the "curse of dimensionality," where
        high-dimensional data can lead to increased computational complexity, overfitting, and difficulty in 
        visualizing the data.

      • Techniques: Dimensionality reduction techniques include Principal Component Analysis (PCA), t-distributed 
        Stochastic Neighbor Embedding (t-SNE), and Linear Discriminant Analysis (LDA).

      • Example: In a dataset with 100 features, dimensionality reduction techniques may reduce it to a lower-dimensional 
        representation, such as 2 or 3 principal components that capture the most significant variance in the data.

  In summary, feature selection focuses on identifying and retaining the most relevant features within the original 
  feature space, while dimensionality reduction aims to transform the data into a lower-dimensional representation 
  while preserving important characteristics. Both techniques help in improving the performance and efficiency of 
  machine learning models, but their methodologies and objectives differ. Depending on the specific problem and dataset, 
  one or both of these techniques may be employed to preprocess the data before training a model."""

#4. Make quick notes on any two of the following:

# 1. The histogram

"""Histogram: Quick Notes

  - Definition: A histogram is a graphical representation of the distribution of numerical data. It shows the
    frequency or count of values falling into different bins or intervals. Each bin represents a range of values, 
    and the height of each bar in the histogram corresponds to the frequency of values within that bin.

  - Purpose: Histograms are used to visualize the underlying distribution of a dataset, revealing patterns, central 
    tendencies, and spread of data. They help identify outliers, understand data skewness, and make informed decisions
    on data preprocessing and modeling.

  - Construction: To create a histogram, follow these steps:
    1. Divide the range of data into several intervals (bins).
    2. Count the number of data points that fall into each bin.
    3. Plot the bins on the x-axis and the corresponding frequencies on the y-axis.

  - Characteristics:
    - Symmetry: Histograms can be symmetric (bell-shaped) or skewed (positively or negatively skewed).
    - Modes: Peaks in the histogram indicate modes or clusters in the data.
    - Outliers: Outliers are visible as isolated bars outside the main distribution.

  - Example: Suppose we have a dataset of exam scores ranging from 0 to 100. We divide the scores into bins
    (e.g., 0-10, 11-20, ..., 91-100) and count how many scores fall into each bin. The resulting histogram can
    help us understand the distribution of scores, identify any concentration of grades, and detect unusual score patterns."""

# 2.PCA (Personal Computer Aid)
"""Principal Component Analysis (PCA): Quick Notes

  - Definition: PCA is a popular dimensionality reduction technique used to transform high-dimensional data into a 
    lower-dimensional space while preserving the most significant variance in the data.

  - Purpose: PCA is employed to address the "curse of dimensionality" and simplify data representation. It aims to 
    reduce computational complexity, improve model efficiency, and enhance data visualization capabilities.

  - Methodology: PCA finds the principal components (orthogonal axes) that capture the most significant variance in 
    the data. The first principal component explains the most variance, the second captures the second most, and so on.
    The number of principal components is equal to the original data's dimensionality.

  - Steps:
    1. Center the data to have zero mean for each feature.
    2. Compute the covariance matrix of the centered data.
    3. Perform eigenvalue decomposition of the covariance matrix to obtain eigenvectors (principal components)
       and eigenvalues.
    4. Select the top k eigenvectors corresponding to the k largest eigenvalues to form the lower-dimensional subspace.

  - Use cases:
    - Data Visualization: PCA is used to visualize high-dimensional data in 2D or 3D, allowing better insight into 
      data patterns.
    - Feature Extraction: PCA can be applied in feature extraction tasks to represent data with fewer dimensions 
      while retaining essential information.

  - Example: In a dataset with multiple features, PCA can identify the principal components that explain the most
    variance. It can then reduce the dimensionality by selecting only the top principal components for subsequent
    analysis or visualization."""

#5. Why is it necessary to investigate data? Is there a discrepancy in how qualitative and quantitative data are explored?

"""Investigating data is a crucial step in the data analysis and machine learning process. It involves exploring the 
   dataset to gain insights, understand its characteristics, identify patterns, and detect potential issues or anomalies. 
   Investigating data is necessary for several reasons:

   1. Data Quality Assurance**: Investigating data helps ensure that the dataset is of high quality. It allows you to
      check for missing values, data inconsistencies, and outliers that could affect the reliability of the analysis 
      or machine learning models.

   2. Feature Selection**: Exploring the data aids in selecting the most relevant features for the analysis or modeling 
      task. It helps identify features that have significant predictive power and are informative for the problem at hand.

   3. Identifying Data Patterns**: Data investigation allows you to identify patterns, trends, and relationships within
      the data. This knowledge can lead to valuable insights and guide decision-making.

   4. Understanding Data Distribution**: Investigating data helps understand the distribution of features and target 
      variables. It is essential for selecting appropriate modeling techniques and understanding the nature of the problem.

   5. Model Performance**: Data investigation can reveal potential challenges that may affect model performance. 
      For instance, class imbalance in classification tasks can impact model accuracy and may require special handling.

   6. Data Preprocessing**: Before applying machine learning algorithms, data often needs to be preprocessed. Data
      investigation helps in deciding how to handle missing values, scale features, or normalize data.

   Regarding the discrepancy in exploring qualitative and quantitative data:

  - Quantitative Data Exploration**: Quantitative data, being numerical, is often easier to explore and analyze. 
    Various statistical techniques like mean, median, standard deviation, and correlation can be used to summarize 
    and understand the data distribution. Visualizations like histograms, scatter plots, and box plots are effective 
    for displaying the data's numerical aspects.

  - Qualitative Data Exploration**: Qualitative data, also known as categorical data, requires different exploration
    techniques. One common approach is to count the frequency of each category to understand the distribution. Bar 
    charts, pie charts, and heatmaps are useful for visualizing categorical data. Additionally, cross-tabulation and 
    chi-square tests can help identify associations between categorical variables.

  It's essential to use the appropriate tools and techniques to explore both types of data effectively. Data investigation
  helps researchers and analysts make informed decisions throughout the data analysis and modeling process, leading to more
  accurate and reliable results."""

#6. What are the various histogram shapes? What exactly are ‘bins'?

"""Histograms can exhibit various shapes, each representing different types of data distributions. The main histogram
   shapes are:

   1. Uniform Distribution**: In a uniform distribution, all data points have approximately the same frequency, and the
      histogram appears flat. Each bin has roughly the same number of data points, indicating that the data is evenly 
      spread across the range.

   2. Normal Distribution (Bell-shaped)**: The normal distribution is characterized by a symmetric, bell-shaped curve. 
      The majority of data points are concentrated around the mean, with fewer points in the tails. The histogram's peak
      represents the mode of the data.

   3. Skewed Distribution**:
      a. Positively Skewed (Right-skewed)**: In a positively skewed distribution, the tail extends towards the higher
         values, and the peak is shifted to the left. This occurs when there are more low values and a few extreme 
         high values.
      b. Negatively Skewed (Left-skewed)**: In a negatively skewed distribution, the tail extends towards the lower 
         values, and the peak is shifted to the right. This happens when there are more high values and a few extreme 
         low values.

   4. Bimodal Distribution**: A bimodal distribution has two distinct peaks, indicating that the data is derived from
      two different populations or processes.

   5. Multimodal Distribution**: A multimodal distribution has multiple peaks, signifying that the data may arise from
      several different sources or subgroups.

   6. Exponential Distribution**: An exponential distribution displays a rapidly decreasing frequency as values increase. 
      It is often used to model data with a constant hazard rate, commonly seen in time-to-failure data.

   Bins:
   In a histogram, bins are the intervals into which the data range is divided. The data points are grouped within these
   intervals, and the height of each bar in the histogram represents the frequency or count of data points falling into
   each bin. The number of bins in a histogram is a critical parameter that can influence the appearance and interpretability
   of the distribution.

   Choosing the right number of bins is important to accurately visualize the data's distribution. Too few bins can 
   oversimplify the representation, smoothing out important details, while too many bins can lead to noise and make
   it difficult to interpret the overall pattern.

   There are various methods to determine the optimal number of bins, such as the Sturges' formula, Scott's rule, and 
   the Freedman-Diaconis rule. These methods consider the sample size and the data's range to calculate the appropriate
   bin width or number of bins for the histogram."""

#7. How do we deal with data outliers?

"""Dealing with data outliers is essential in data analysis and machine learning to ensure that extreme values do
   not unduly influence the results or model performance. Outliers can arise due to various reasons, such as measurement 
   errors, data entry mistakes, or genuinely rare events. There are several approaches to handle data outliers:

   1. Identifying Outliers**:
      - Visual Inspection: Plotting the data using scatter plots, box plots, or histograms can help identify outliers
        visually.
      - Statistical Methods: Outliers can be detected using statistical techniques such as the z-score (measuring how
        many standard deviations a data point is from the mean) or the interquartile range (IQR) method.

   2. Removing Outliers**:
      - Removing Outliers Directly: If the number of outliers is small and their impact on the analysis is negligible, 
        it may be reasonable to remove them from the dataset. However, this approach should be used with caution, as
        removing outliers can lead to information loss and biased results.

   3. Transforming Data**:
      - Winsorizing: Winsorization involves replacing extreme values with less extreme values. For example, the upper 
        outliers can be replaced with the maximum value within a certain range, and the lower outliers can be replaced
        with the minimum value within the same range.
      - Log Transformation: Applying a log transformation to skewed data can compress extreme values and bring them closer
        to the central values, reducing the influence of outliers.

   4. Binning or Discretization**:
      - Binning involves dividing the data into intervals and replacing the values with the interval's midpoint or 
        average. This can help reduce the impact of individual extreme values.

   5. Robust Statistical Methods**:
      - Instead of using traditional statistical techniques that are sensitive to outliers (e.g., mean and standard
        deviation), robust statistical methods like median and median absolute deviation (MAD) can be used, which 
        are less affected by extreme values.

   6. Data Imputation**:
      - If the outliers represent missing or erroneous data, they can be imputed with more plausible values based on 
        the context and distribution of the remaining data.

   7. Modeling Techniques**:
      - Some machine learning algorithms are inherently robust to outliers. For instance, tree-based models like Random
        Forest and Gradient Boosting Machines can handle outliers effectively.

   8. Data Segmentation**:
      - In some cases, it may be appropriate to create separate models for different segments of the data, treating 
        outliers and non-outliers differently.

   The choice of the approach to deal with outliers depends on the data, the problem at hand, and the impact of outliers
   on the analysis or modeling task. It is crucial to carefully consider the implications of handling outliers and to 
   document any preprocessing steps taken to ensure transparency and reproducibility."""

#8. What are the various central inclination measures? Why does mean vary too much from median in certain data sets?

"""Central inclination measures, also known as measures of central tendency, are statistics that represent the central 
   or typical value of a dataset. They provide insights into where the data is centered. The main central inclination
   measures are:

   1. Mean: The mean, also called the arithmetic average, is the sum of all values in the dataset divided by the number
      of data points. It is calculated using the formula: Mean = (Sum of all values) / (Number of data points).

   2. Median: The median is the middle value when the data is arranged in ascending or descending order. If the number 
      of data points is odd, the median is the middle value. If the number of data points is even, the median is the 
      average of the two middle values.

   3. Mode: The mode is the value that appears most frequently in the dataset. A dataset can have one mode (unimodal), 
      two modes (bimodal), or more (multimodal).

   The mean and median are the most commonly used central inclination measures. The mode is particularly useful for
   categorical data.

  **Variation between Mean and Median**:

  The mean and median can differ significantly in certain data sets, particularly when the data distribution is skewed. 
  Skewness refers to the asymmetry of the data distribution, where it is stretched more to one side than the other. 
  There are two main types of skewness:

  1. Positive Skewness (Right Skew): In a positively skewed distribution, the tail is elongated towards the higher values.
     This means that the data has a few extreme high values that pull the mean towards the right, making it higher than
     the median.

  2. Negative Skewness (Left Skew): In a negatively skewed distribution, the tail is elongated towards the lower values.
     This occurs when the data has a few extreme low values that drag the mean towards the left, making it lower than 
     the median.

  Example of Skewness:
  Consider a dataset representing the incomes of people in a country. Most people have moderate incomes, but there are
  a few individuals with extremely high incomes, such as billionaires. In this case, the income distribution would be 
  positively skewed, leading to a mean significantly higher than the median.

  The presence of outliers in a dataset can also affect the mean more than the median. Since the mean takes into account
  the exact value of each data point, outliers can disproportionately influence the mean value, especially in smaller
  datasets. In contrast, the median is less sensitive to outliers, as it only considers the middle value(s) and is 
  less affected by extreme values.

  It's essential to consider both the mean and median (and other central tendency measures) when analyzing data to
  better understand the underlying distribution and account for any skewness or extreme values."""

#9. Describe how a scatter plot can be used to investigate bivariate relationships. Is it possible to find outliers using 
a scatter plot?

"""A scatter plot is a graphical representation of bivariate data, displaying the relationship between two variables.
   It is an effective visualization tool to investigate the correlation and pattern between two quantitative variables. 
   Each data point in the scatter plot represents a pair of values from the two variables, and the plot shows their
   corresponding position on the x-axis and y-axis.

   **Using a Scatter Plot to Investigate Bivariate Relationships**:

   1. Identifying Patterns**: A scatter plot helps visualize the relationship between the two variables. If the points
      tend to form a pattern or trend, it indicates a potential correlation between the variables. Common patterns 
      include linear, quadratic, exponential, or no apparent relationship.

   2. Strength and Direction of Correlation**: The scatter plot provides insight into the strength and direction of the 
      correlation between the variables. If the points cluster around a straight line from the bottom-left to the top-right,
      it suggests a positive correlation (as one variable increases, the other tends to increase). If the points cluster 
      around a straight line from the top-left to the bottom-right, it suggests a negative correlation (as one variable 
      increases, the other tends to decrease).

   3. Outliers Detection**: Scatter plots can help identify outliers, which are data points that deviate significantly
      from the general pattern. Outliers may represent data entry errors or rare events. They are often visually distinct
      from the majority of the data points, lying far away from the main cluster.

   4. Data Distribution**: Scatter plots also provide an idea of the distribution of data points. The spread and 
      concentration of points can give insights into data density and potential data skewness.

   **Finding Outliers using a Scatter Plot**:

   Yes, scatter plots can be used to detect outliers visually. Outliers are data points that fall far away from the 
   main cluster of points. They may lie in regions of the plot that are distant from the majority of data points or 
   exhibit a different pattern compared to the rest of the data.

   In a scatter plot, outliers can be identified as data points that:
   - Are located far away from the majority of the points along one or both axes.
   - Show an unusual pattern or behavior compared to the general trend of the data.

   Outliers may indicate data entry errors, measurement anomalies, or genuinely rare events. If outliers are found,
   it is essential to investigate their origin and potential impact on the analysis or machine learning models. 
   Depending on the situation and the nature of the outliers, different strategies such as data cleaning, data 
   transformation, or model robustness enhancement can be employed to handle outliers effectively."""

#10. Describe how cross-tabs can be used to figure out how two variables are related.

"""Cross-tabulation, commonly known as cross-tabs or contingency tables, is a powerful method used to explore the 
   relationship between two categorical variables. It provides a tabular representation of the joint distribution 
   of the two variables, showing how their categories intersect.

   **Creating a Cross-Tabulation Table**:

   To create a cross-tabulation table, follow these steps:

   1. Select the two categorical variables of interest.

   2. List all unique categories of the first variable as rows and the unique categories of the second variable as columns.

   3. Count the occurrences of each combination of categories (intersection) and place the counts in the corresponding
     cells of the table.

  **Using Cross-Tabs to Investigate the Relationship**:

  Once you have the cross-tabulation table, you can examine the relationship between the two variables:

  1. Frequency Distribution: The table provides a frequency distribution of the joint occurrences of the two variables. 
     It shows how often each combination of categories occurs in the dataset.

  2. Conditional Probabilities: You can calculate conditional probabilities based on the cross-tabulation. This involves
     dividing the cell count by the total count to find the probability of one variable occurring given the occurrence 
     of the other variable.

  3. Identifying Associations: Cross-tabs allow you to observe if there are any apparent associations or dependencies
     between the two categorical variables. If the distribution of counts is not uniform across the table, it suggests
     an association.

  4. Chi-Square Test: A statistical test called the chi-square test can be applied to cross-tabs to determine whether
     the observed associations are statistically significant or just due to chance.

  5. Visualization: You can use graphical representations, such as stacked bar charts or heatmaps, to visualize the 
     cross-tabulation table and make the relationships between categories more apparent.

  **Example**:

  Consider a survey dataset with two categorical variables: "Gender" and "Interest in Technology." The cross-tabulation
  table might look like this:

```
                Interest in Technology
Gender          Yes      No       Maybe
----------------------------------------
Male            50       30       20
Female          40       25       15
Non-Binary      10       5        3
```

  From the cross-tabulation, you can observe the distribution of responses for each gender. For instance, you can see
  that more males are interested in technology (50) compared to females (40) and non-binary individuals (10). You can 
  also calculate the conditional probabilities, such as the probability of a male being interested in technology 
  (50/(50+30+20)).

  Cross-tabulation is a valuable tool for exploring and understanding relationships between categorical variables,
  making it a fundamental technique in exploratory data analysis and hypothesis testing."""