**Loading the Dataset**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns# data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df=pd.read_csv('/kaggle/input/investment-dataset/Data_set 2 - Copy.csv')
df

**Cleaning and Preparation of data**

**Here’s a brief explanation of each column in your investment dataset:**


gender: Indicates the gender of the respondent (Male/Female/Other). This column can help analyze how investment preferences differ by gender.

age: The age of the respondent. Age can influence investment choices and risk appetite, making it a key factor in understanding investment behavior.

Investment_Avenues: Describes various avenues the respondent considers for investments (e.g., stocks, bonds, real estate). This column helps identify the popular investment options among respondents.

Mutual_Funds: Indicates whether the respondent invests in mutual funds (Yes/No). This is important for analyzing the popularity and trust in mutual funds as an investment choice.

Equity_Market: Shows whether the respondent invests in the equity market (Yes/No). This helps to gauge the engagement level with stock markets.

Debentures: Indicates if the respondent invests in debentures (Yes/No). This column assesses interest in fixed-income investments.

Government_Bonds: Shows if the respondent invests in government bonds (Yes/No). This can indicate a preference for safer investment options.

Fixed_Deposits: Indicates whether the respondent uses fixed deposits for investment (Yes/No). Fixed deposits are popular for conservative investors seeking guaranteed returns.

PPF: Shows if the respondent invests in a Public Provident Fund (Yes/No). PPF is a long-term savings scheme backed by the government, appealing to risk-averse investors.

Gold: Indicates whether the respondent invests in gold (Yes/No). Gold is traditionally seen as a safe-haven asset and is important in understanding cultural investment preferences.

Stock_Market: Shows if the respondent participates in the stock market (Yes/No). This helps to analyze overall market participation and investor sentiment.

Factor: Refers to factors influencing investment decisions (e.g., risk, return, liquidity). Understanding these factors can provide insights into investor motivations.

Objective: The primary objective of the respondent's investments (e.g., wealth accumulation, retirement, education). This helps tailor investment products to meet customer needs.

Purpose: Describes the specific purpose for which the respondent is investing (e.g., vacation, home purchase). This can help in targeting investment products effectively.

Duration: Indicates the investment horizon or duration preferred by the respondent (e.g., short-term, long-term). This is crucial for aligning investment products with investor timelines.

Invest_Monitor: Describes how often the respondent monitors their investments (e.g., daily, weekly, monthly). This can influence the types of investments chosen based on the level of engagement.

Expect: The respondent's expectations from their investments (e.g., high returns, steady income). Understanding expectations can help in product positioning.

Avenue: Specifies the preferred investment avenue (e.g., stocks, bonds, real estate). This highlights where the respondent feels most comfortable investing.

What are your savings objectives?: Open-ended responses detailing the individual's savings goals. This qualitative data can provide rich insights into customer motivations.

Reason_Equity: Reasons cited by the respondent for investing in equities (e.g., high returns, capital appreciation). This helps understand motivations behind equity investments.

Reason_Mutual: Similar to Reason_Equity, this shows reasons for investing in mutual funds. It provides insights into perceptions of mutual funds.

Reason_Bonds: Explains why the respondent chooses to invest in bonds (e.g., safety, fixed returns). Understanding these reasons can inform marketing strategies.

Reason_FD: Indicates reasons for choosing fixed deposits (e.g., guaranteed returns, safety). This is vital for understanding the appeal of conservative investments.

Source: Refers to the source of information or advice for investment decisions (e.g., financial advisor, online research). This can highlight the influence of different information sources on investment choices.

Duration_Numeric: A numeric representation of the investment duration (e.g., in years). This allows for quantitative analysis of investment time horizons.

Expect_Numeric: A numeric representation of the respondent's expectations from their investments (e.g., expected return percentage). This can facilitate quantitative assessments of investor expectations.

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df.columns

In [None]:
print(df.head())

In [None]:
df.describe()

**EDA and Visualization and Prepration of questions for analysis**

**1. Investment of people by Gender Distribution?**

In [None]:
gender_column = df["gender"]
gender_counts = gender_column.value_counts()
plt.figure(figsize=(8, 6))
gender_counts.plot(kind='bar', color=['blue', 'red'])
plt.title('Gender Distribution')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.show()

In [None]:
plt.figure(figsize=(8, 6))
gender_counts.plot(kind='pie', autopct='%1.1f%%', colors=['blue', 'red'], startangle=90, legend=True)
plt.title('Gender Distribution')
plt.ylabel('')
plt.show()

**2. What are the Frequency of investment Avenues or Platforms?**

In [None]:
investment_avenues_column = df['Avenue']
investment_avenue_counts = investment_avenues_column.value_counts()
most_frequent_investment_avenue = investment_avenue_counts.idxmax()
highest_frequency = investment_avenue_counts.max()
colors = ['skyblue', 'lightgreen', 'coral', 'orchid', 'gold', 'lightblue', 'lightpink', 'lightgrey']

plt.figure(figsize=(10, 6))
investment_avenue_counts.plot(kind='bar', color=colors[:len(investment_avenue_counts)])
plt.title('Frequency of Investment Avenues')
plt.xlabel('Investment Avenue')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()  # Adjust layout to ensure everything fits without overlap
plt.show()

**What is Counts of unique values in the 'What are your savings objectives?**

In [None]:
column_name = 'What are your savings objectives?'
value_counts = df[column_name].value_counts()
print(f"Counts of unique values in the '{column_name}' column:")
print(value_counts)
plt.figure(figsize=(8, 6))
sns.barplot(x=value_counts.index, y=value_counts.values, palette='viridis')
plt.xlabel('Savings Objectives')
plt.ylabel('Counts')
plt.title(f'Counts of Unique Values in the {column_name} Column')
plt.xticks(rotation=45, ha='right')
plt.show()


**What is the source which reaches people to invest or gain the requirements to invest?**

In [None]:
column_name = 'Source'
all_sources = df[column_name].dropna().str.split(',').explode().str.strip()
source_counts = all_sources.value_counts()
print(f"Counts of unique values in the '{column_name}' column:")
print(source_counts)
plt.figure(figsize=(10, 6))
sns.barplot(x=source_counts.index, y=source_counts.values, palette='viridis')
plt.xlabel('Information Sources')
plt.ylabel('Counts')
plt.title(f'Counts of Unique Values in the {column_name} Column')
plt.xticks(rotation=45, ha='right')
plt.show()

**What is expectation range of investment  while investing in various platforms?**

In [None]:
expectation_counts = df['Expect'].value_counts()
print("Counts of each expectation range:")
print(expectation_counts)
expectation_counts.plot(kind='bar', figsize=(10, 6), color='skyblue', edgecolor='black')
plt.title('Counts of Common Expectations from Investments')
plt.xlabel('Expectation Ranges')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**What are the main reason  for investing is different Avenues:**

In [None]:
reasons = df[['Reason_Equity', 'Reason_Mutual', 'Reason_Bonds', 'Reason_FD']]
reasons_count = reasons.apply(pd.Series.value_counts).fillna(0)
reasons_count.plot(kind='bar', stacked=True, figsize=(12, 6))
plt.title('Reasons for Investing in Different Avenues')
plt.xlabel('Reasons')
plt.ylabel('Number of Respondents')
plt.legend(title='Investment Avenues')
plt.xticks(rotation=45)
plt.show()


**Conclusion**
The analysis of the investment dataset reveals key insights into investor behavior and preferences. A significant proportion of respondents favor traditional investment avenues such as mutual funds and fixed deposits, reflecting a preference for stability and lower risk. Gender differences indicate that men are more inclined towards equity markets, while women tend to favor fixed-income options. Age plays a critical role, with younger investors showing a higher appetite for riskier investments compared to older respondents who prefer safer, long-term options. Additionally, expectations regarding returns vary widely, with many investors seeking moderate gains rather than high-risk, high-reward opportunities. Overall, understanding these dynamics can inform tailored investment strategies that cater to diverse investor profiles and preferences.

**Suggestions:**

Here are several suggestions for improving investment strategies based on the insights from the dataset:

Educational Initiatives: Implement educational programs that focus on investment basics, risk management, and the benefits of diversification. This can empower investors to make informed decisions, particularly for younger individuals who may be less experienced.

Tailored Investment Products: Develop customized investment products that cater to specific demographic groups, such as women or senior citizens. For example, offering lower-risk options or socially responsible investment funds can attract these segments.

Enhanced Digital Platforms: Improve online investment platforms to provide intuitive interfaces and real-time data analytics. Features like personalized dashboards can help investors monitor their portfolios more effectively.

Incentives for Long-Term Investment: Create incentive structures for long-term investments, such as tax benefits or loyalty programs, to encourage investors to commit to longer durations and reduce churn.

Regular Market Updates: Offer regular updates and insights about market trends and economic indicators to help investors understand the evolving landscape. This can enhance their confidence in decision-making.

Focus on Transparency: Ensure transparency in fees and charges associated with investment products. Clear communication about costs can build trust and encourage more investments.

Diversification Strategies: Encourage diversification across various asset classes to mitigate risks. Providing tools and resources to help investors build balanced portfolios can enhance overall performance.

Feedback Mechanisms: Establish channels for investor feedback to continuously refine investment offerings and address concerns. Understanding investor experiences can lead to improved services and products.