
---

### Part 1: Overall Risk Profile (like *Overall Performance*)

These questions provide a high-level overview of the loan portfolio and its inherent risk.

1.  **Question:** What is the total number of loan applications in the dataset?
    *   **Analytical Goal:** To understand the size of the dataset (equivalent to *Total Orders*).
    *   **Key Variables:** `SK_ID_CURR`

2.  **Question:** What is the overall default rate for all applications?
    *   **Analytical Goal:** To establish the single most important baseline metric for the entire portfolio (equivalent to *Total Profit/Loss*).
    *   **Key Variables:** `TARGET`

3.  **Question:** How many applications fall into each category: "Defaulted" vs. "Repaid"?
    *   **Analytical Goal:** To visualize the class imbalance, which is critical for any machine learning task.
    *   **Key Variables:** `TARGET`

4.  **Question:** What is the distribution of loan types (`NAME_CONTRACT_TYPE`)? Are cash loans or revolving loans more common?
    *   **Analytical Goal:** To understand the composition of the loan portfolio (equivalent to *Sales by Category*).
    *   **Key Variables:** `NAME_CONTRACT_TYPE`

5.  **Question:** How does the default rate differ between Cash loans and Revolving loans?
    *   **Analytical Goal:** To see if one type of loan product is inherently riskier than another (equivalent to *Profit by Category*).
    *   **Key Variables:** `NAME_CONTRACT_TYPE`, `TARGET`

6.  **Question:** What is the distribution of applicants by gender?
    *   **Analytical Goal:** A basic demographic breakdown of the applicant pool (equivalent to *Sales by Segment*).
    *   **Key Variables:** `CODE_GENDER`

7.  **Question:** Does the default rate vary significantly between male and female applicants?
    *   **Analytical Goal:** To check for risk disparity across genders.
    *   **Key Variables:** `CODE_GENDER`, `TARGET`

---

### Part 2: Applicant Demographics & Socio-Economic Status (like *Customer Analysis*)

These questions delve into who the applicants are and how their backgrounds correlate with risk.

8.  **Question:** What is the age distribution of all applicants?
    *   **Analytical Goal:** To understand the age profile of the customer base.
    *   **Key Variables:** `DAYS_BIRTH`

9.  **Question:** How does the default rate trend across different age groups (e.g., 20-30, 30-40, etc.)?
    *   **Analytical Goal:** To identify if age is a significant risk factor.
    *   **Key Variables:** `DAYS_BIRTH`, `TARGET`

10. **Question:** What is the default rate for different family statuses (`NAME_FAMILY_STATUS`)?
    *   **Analytical Goal:** To assess if an applicant's marital or family situation correlates with their repayment behavior.
    *   **Key Variables:** `NAME_FAMILY_STATUS`, `TARGET`

11. **Question:** How does applicant income (`AMT_INCOME_TOTAL`) distribution look for those who defaulted versus those who did not?
    *   **Analytical Goal:** To see if lower income is a strong indicator of default. An overlapping density plot is great for this.
    *   **Key Variables:** `AMT_INCOME_TOTAL`, `TARGET`

12. **Question:** What is the breakdown of applicants by their highest level of education (`NAME_EDUCATION_TYPE`)?
    *   **Analytical Goal:** To understand the educational background of the applicant pool.
    *   **Key Variables:** `NAME_EDUCATION_TYPE`

13. **Question:** Is there a correlation between education level and default rate?
    *   **Analytical Goal:** To determine if higher education leads to lower credit risk.
    *   **Key Variables:** `NAME_EDUCATION_TYPE`, `TARGET`

14. **Question:** What is the default rate for applicants with and without children?
    *   **Analytical Goal:** To see if the number of dependents impacts repayment ability.
    *   **Key Variables:** `CNT_CHILDREN`, `TARGET`

15. **Question:** How does the default rate change for applicants who own a car vs. those who don't?
    *   **Analytical Goal:** To assess if car ownership is a signal of financial stability.
    *   **Key Variables:** `FLAG_OWN_CAR`, `TARGET`

16. **Question:** Similarly, how does the default rate change for applicants who own real estate vs. those who don't?
    *   **Analytical Goal:** To check if property ownership is a significant factor in reducing risk.
    *   **Key Variables:** `FLAG_OWN_REALTY`, `TARGET`

17. **Question:** What are the most common housing types (`NAME_HOUSING_TYPE`) for applicants, and what are their associated default rates?
    *   **Analytical Goal:** To understand if living situations (renting, living with parents) correlate with risk.
    *   **Key Variables:** `NAME_HOUSING_TYPE`, `TARGET`

---

### Part 3: Loan & Financial Characteristics (like *Product-Level Analysis*)

These questions focus on the specifics of the loan itself and the applicant's financial health.

18. **Question:** What is the distribution of loan amounts (`AMT_CREDIT`)?
    *   **Analytical Goal:** To understand the typical size of loans being requested.
    *   **Key Variables:** `AMT_CREDIT`

19. **Question:** Is there a relationship between the loan amount (`AMT_CREDIT`) and the price of the goods the loan is for (`AMT_GOODS_PRICE`)?
    *   **Analytical Goal:** To check for consistency and identify loans that are much larger than the asset price (potential risk).
    *   **Key Variables:** `AMT_CREDIT`, `AMT_GOODS_PRICE`

20. **Question:** What are the most common income types (`NAME_INCOME_TYPE`) for applicants?
    *   **Analytical Goal:** To segment applicants by their source of income (e.g., Working, Pensioner).
    *   **Key Variables:** `NAME_INCOME_TYPE`

21. **Question:** Which income types have the highest and lowest default rates?
    *   **Analytical Goal:** To identify the riskiest employment segments (e.g., are pensioners more or less risky than state servants?).
    *   **Key Variables:** `NAME_INCOME_TYPE`, `TARGET`

22. **Question:** What is the distribution of the "Credit to Income Ratio" (`AMT_CREDIT / AMT_INCOME_TOTAL`)?
    *   **Analytical Goal:** To analyze how leveraged applicants are. This is a powerful risk indicator (equivalent to the *Impact of Discount*).
    *   **Key Variables:** `AMT_CREDIT`, `AMT_INCOME_TOTAL`

23. **Question:** How does the default rate change as the "Credit to Income Ratio" increases?
    *   **Analytical Goal:** To confirm the hypothesis that highly leveraged clients are more likely to default.
    *   **Key Variables:** `AMT_CREDIT`, `AMT_INCOME_TOTAL`, `TARGET`

24. **Question:** What is the distribution of employment duration (`DAYS_EMPLOYED`) for applicants?
    *   **Analytical Goal:** To see if the applicant pool consists of people with stable jobs or recent hires.
    *   **Key Variables:** `DAYS_EMPLOYED`

25. **Question:** Does a longer employment history correlate with a lower default rate?
    *   **Analytical Goal:** To validate if job stability is a good indicator of creditworthiness.
    *   **Key Variables:** `DAYS_EMPLOYED`, `TARGET`

26. **Question:** Which organization types (`ORGANIZATION_TYPE`) have the most applicants?
    *   **Analytical Goal:** To understand which industries or sectors our customers work in.
    *   **Key Variables:** `ORGANIZATION_TYPE`

27. **Question:** Which organization types have the highest and lowest default rates?
    *   **Analytical Goal:** To perform risk assessment at an industry level (equivalent to *Profit by Sub-Category*).
    *   **Key Variables:** `ORGANIZATION_TYPE`, `TARGET`

---

### Part 4: External & Internal Scoring Analysis (Advanced & Combined Views)

This section focuses on the highly predictive normalized scores and other internal flags.

28. **Question:** How do the three external source scores (`EXT_SOURCE_1`, `2`, `3`) correlate with each other?
    *   **Analytical Goal:** To understand the redundancy and relationship between these key predictive features. A correlation heatmap is perfect here.
    *   **Key Variables:** `EXT_SOURCE_1`, `EXT_SOURCE_2`, `EXT_SOURCE_3`

29. **Question:** What is the distribution of each `EXT_SOURCE` score for applicants who defaulted vs. those who did not?
    *   **Analytical Goal:** To visually confirm that lower external scores are associated with higher default risk.
    *   **Key Variables:** `EXT_SOURCE_1`, `EXT_SOURCE_2`, `EXT_SOURCE_3`, `TARGET`

30. **Question:** What is the relationship between applicant Age and `EXT_SOURCE_1`?
    *   **Analytical Goal:** To see if external scoring models are heavily influenced by age.
    *   **Key Variables:** `DAYS_BIRTH`, `EXT_SOURCE_1`

31. **Question:** How many documents (`FLAG_DOCUMENT_...`) are typically submitted by applicants?
    *   **Analytical Goal:** To understand the completeness of applications.
    *   **Key Variables:** All `FLAG_DOCUMENT_...` columns.

32. **Question:** Is there a correlation between providing a specific document (e.g., `FLAG_DOCUMENT_3`) and the default rate?
    *   **Analytical Goal:** To see if the presence or absence of certain documents is a risk indicator.
    *   **Key Variables:** `FLAG_DOCUMENT_3`, `TARGET`

33. **Question:** How does the default rate vary based on the number of credit bureau inquiries in the last year?
    *   **Analytical Goal:** To test the hypothesis that many recent credit inquiries signal financial distress and higher risk.
    *   **Key Variables:** `AMT_REQ_CREDIT_BUREAU_YEAR`, `TARGET`

---

### Part 5: Multi-variable & Interactive Analysis

These questions combine multiple factors to uncover more complex patterns, similar to faceted or interactive charts.

34. **Question:** How does the "Credit to Income Ratio" distribution vary across different education levels?
    *   **Analytical Goal:** To see if people with higher education take on more or less relative debt.
    *   **Key Variables:** `NAME_EDUCATION_TYPE`, `AMT_CREDIT`, `AMT_INCOME_TOTAL`

35. **Question:** What is the average loan amount (`AMT_CREDIT`) by both Family Status and Income Type?
    *   **Analytical Goal:** A 2D analysis to see how different life situations and job types intersect to influence loan size (a heatmap would be good here).
    *   **Key Variables:** `NAME_FAMILY_STATUS`, `NAME_INCOME_TYPE`, `AMT_CREDIT`

36. **Question:** What is the default rate by both Family Status and Income Type?
    *   **Analytical Goal:** To pinpoint specific high-risk segments, e.g., "Unmarried" applicants in the "Maternity leave" income type.
    *   **Key Variables:** `NAME_FAMILY_STATUS`, `NAME_INCOME_TYPE`, `TARGET`

37. **Question:** In a scatter plot of Age vs. Loan Amount, are there distinct clusters for defaulters and non-defaulters?
    *   **Analytical Goal:** To visually explore the interaction between age, loan size, and risk.
    *   **Key Variables:** `DAYS_BIRTH`, `AMT_CREDIT`, `TARGET`

38. **Question:** For applicants who own cars, what is the age of their car (`OWN_CAR_AGE`) and does it correlate with default risk?
    *   **Analytical Goal:** To see if the condition/age of an asset provides a signal about financial health.
    *   **Key Variables:** `OWN_CAR_AGE`, `TARGET`

39. **Question:** How does the distribution of `EXT_SOURCE_2` look when faceted by `NAME_CONTRACT_TYPE`?
    *   **Analytical Goal:** To see if the external scores behave differently for cash vs. revolving loans.
    *   **Key Variables:** `EXT_SOURCE_2`, `NAME_CONTRACT_TYPE`

40. **Question:** What is the relationship between the number of family members and the loan annuity (`AMT_ANNUITY`)?
    *   **Analytical Goal:** To check if larger families opt for loans with smaller, more manageable repayment installments.
    *   **Key Variables:** `CNT_FAM_MEMBERS`, `AMT_ANNUITY`

---

### Part 6: Hypothetical & Time-Based Questions (Filling Gaps)

The dataset lacks direct geographical or fine-grained time-series data. These questions show what you *would* ask if you had it, or use proxies.

41. **Question (Proxy for Time):** How has the average loan amount changed over the applicants' employment duration?
    *   **Analytical Goal:** To see if people with longer careers apply for larger loans.
    *   **Key Variables:** `DAYS_EMPLOYED`, `AMT_CREDIT`

42. **Question (Proxy for Time):** How has the average `EXT_SOURCE_3` score changed over the applicants' employment duration?
    *   **Analytical Goal:** To see if creditworthiness (as measured by external scores) improves with job stability.
    *   **Key Variables:** `DAYS_EMPLOYED`, `EXT_SOURCE_3`

43. **Question (Hypothetical Geography):** If we had region/state data, which regions would have the highest default rates?
    *   **Analytical Goal:** To identify geographical hotspots of risk.
    *   **Key Variables:** `(Hypothetical Region)`, `TARGET`

44. **Question (Hypothetical Geography):** If we had city data, what would be the correlation between city population density and default rate?
    *   **Analytical Goal:** To explore macro-economic factors related to urbanization and risk.
    *   **Key Variables:** `(Hypothetical City Population)`, `TARGET`

45. **Question (Combined):** For working applicants, what is the default rate for different education levels?
    *   **Analytical Goal:** To see if education is a risk factor even within a single, stable income type.
    *   **Key Variables:** `NAME_INCOME_TYPE` (filtered to 'Working'), `NAME_EDUCATION_TYPE`, `TARGET`

46. **Question (Combined):** What is the average "Annuity to Income Ratio" (`AMT_ANNUITY / AMT_INCOME_TOTAL`) for different housing types?
    *   **Analytical Goal:** To see if people with certain housing situations (e.g., renting) commit a larger portion of their income to loan repayments.
    *   **Key Variables:** `AMT_ANNUITY`, `AMT_INCOME_TOTAL`, `NAME_HOUSING_TYPE`

47. **Question (Distribution):** What is the distribution of the "Annuity to Credit Ratio" (`AMT_ANNUITY / AMT_CREDIT`)?
    *   **Analytical Goal:** This ratio is related to the interest rate and loan term. Understanding its distribution can reveal patterns in loan products.
    *   **Key Variables:** `AMT_ANNUITY`, `AMT_CREDIT`

48. **Question (Interaction):** How does the default rate change with `EXT_SOURCE_1` for different loan types?
    *   **Analytical Goal:** To see if the predictive power of an external score is consistent across different products.
    *   **Key Variables:** `EXT_SOURCE_1`, `NAME_CONTRACT_TYPE`, `TARGET`

49. **Question (Deep Dive):** For the highest risk `ORGANIZATION_TYPE`, what is the age and income distribution of its applicants?
    *   **Analytical Goal:** To create a detailed profile of the riskiest customer segments.
    *   **Key Variables:** `ORGANIZATION_TYPE`, `DAYS_BIRTH`, `AMT_INCOME_TOTAL`

50. **Question (Final Sanity Check):** In a parallel categories plot, can we visualize the flow of applicants from `NAME_INCOME_TYPE` to `NAME_EDUCATION_TYPE` to the final `TARGET` status?
    *   **Analytical Goal:** To get a holistic, visual summary of the relationships between the most important categorical variables and the final outcome.
    *   **Key Variables:** `NAME_INCOME_TYPE`, `NAME_EDUCATION_TYPE`, `TARGET`