# Homework 5: Data Reshaping with tidyr

**Course:** Data Wrangling in R for Business Analytics  
**Topic:** Data Reshaping and Tidy Data Principles  
**Due Date:** 09/28/2025
QUESTIONS ONLY 

---

## Assignment Overview

This homework focuses on mastering data reshaping techniques using R's tidyverse ecosystem, specifically the `tidyr` package. You'll work with real-world business datasets to practice converting between wide and long formats, understanding when each format is most appropriate for analysis.

### Learning Objectives
- Master `pivot_longer()` and `pivot_wider()` functions for data reshaping
- Understand the principles of tidy data and their business applications
- Apply appropriate data structures for different analytical purposes
- Validate data integrity during transformation processes
- Prepare data for visualization and statistical analysis

### Business Context
Data reshaping is a fundamental skill in business analytics. Different analytical tasks, visualization requirements, and stakeholder needs often require data in specific formats. This assignment will help you develop the strategic thinking needed to choose and implement appropriate data structures.

---

## Instructions

**Submission Requirements:**
- Complete all tasks in this R notebook
- Use the pipe operator (`%>%`) and chain operations wherever possible
- Ensure your code is well-commented and demonstrates understanding
- Include business interpretations of your results
- Submit your completed notebook file

**Evaluation Criteria:**
- Correct implementation of reshaping functions
- Appropriate choice of data formats for different tasks
- Quality of code comments and explanations
- Business insight and interpretation
- Data validation and quality checks

---

## Reflection Questions

### üìù **Critical Thinking and Learning Assessment**

Please provide thoughtful responses to the following reflection questions. Your answers should demonstrate understanding of both technical concepts and business applications of data reshaping.

---

### **Question 1: Strategic Format Selection** üéØ
*Describe a specific business scenario from your current or future workplace where you would need to convert data from wide to long format. Explain your reasoning for choosing long format and what type of analysis this would enable. Include details about the stakeholders involved and how the format choice would impact their ability to understand and use the results.*

**Your Response:**
```
I am going to use my past summer internship in Venture Capital as an example for this scenario. One specific case where I would need to convert data from wide to long format is when handling portfolio company KPIs. For example, during reporting season, we often received quarterly data in a wide format with columns like ‚ÄúRevenue Q1,‚Äù ‚ÄúRevenue Q2,‚Äù ‚ÄúRevenue Q3,‚Äù and so on. While this looks fine in a spreadsheet, it is difficult to analyze in R because each quarter is stored in a separate column. By converting the data into long format with functions like pivot_longer() from the tidyverse, each row represents a company, a quarter, and the KPI value. This structure makes it easier to run time-series models, create ggplot trend charts, and compare growth patterns across the portfolio. The stakeholders who benefit from this shift include the investment team, partners, and limited partners (LPs). For the investment team, long format enables more flexible visualizations in R. For LPs, the long format allows for clearer charts and summaries that highlight performance over time instead of presenting static tables. Choosing long format in R not only improved the accuracy and scalability of our analysis, but also made the final results easier for LPs to understand and use when evaluating fund performance.```

---

### **Question 2: Validation and Data Integrity** üîç
*During this homework, we implemented several validation checks after each reshaping operation. Reflect on why data validation is crucial in business analytics and describe what could happen if validation steps were skipped. Provide a specific example of a business decision that could be negatively impacted by unvalidated data transformations.*

**Your Response:**
```
Data validation is crucial in business analytics because it ensures that the information we are analyzing and presenting is accurate and reliable. If validation steps are skipped, errors in the data could go unnoticed and lead to misleading insights or poor decision making. For example, if data is reshaped from wide to long format in R and values are accidentally duplicated or dropped, the results of any time-series analysis would be skewed. In a business setting, this could cause major problems. One specific example from my past VC internship would be portfolio company KPI reporting. If I transformed quarterly revenue data without validating totals and time periods, I might misinterpret and miscaluclate growth rates. That type of mistake could lead the partners to present incorrect numbers to limited partners, impacting fundraising discussions and credibility. Skipping validation not only risks flawed analysis but also damages trust with stakeholders who rely on accurate reporting.
```

---

### **Question 3: Efficiency and Process Improvement** ‚ö°
*Compare your problem-solving approach at the beginning versus the end of this assignment. How did your thinking about data structure and analysis workflow evolve? Describe how mastering data reshaping could improve efficiency in your academic projects or professional work. Include specific time estimates if possible.*

**Your Response:**
```
At the beginning of this assignment, my problem-solving approach was more focused on just getting the data into the right shape without fully thinking about efficiency or long-term workflow. I was mainly concerned with making sure the reshaping functions worked and that the output looked correct. By the end of the assignment, my thinking had shifted toward a deeper understanding of how data structure directly impacts analysis. Even though I still struggled with some pasrts like tables dispalying too many NA values, I realized that organizing data in a consistent and tidy way early on saves a lot of time later when running models, creating visualizations, or sharing results with others. Mastering data reshaping can improve efficiency in both academic projects and professional work. For example, in my past VC internship, converting portfolio KPIs into long format in R could cut down the reporting process by hours, since it allows for automated charting and consistent time-series analysis. Instead of spending 2‚Äì3 hours manually reformatting tables, reshaping with functions like pivot_longer() or pivot_wider() could reduce that step to just a few minutes. Over the course of a semester or a summer internship, this efficiency adds up and allows more time to be spent on interpretation and decision-making rather than data cleaning.
```

---

### **Question 4: Stakeholder Communication** üíº
*Imagine you need to present the results of your quarterly sales analysis to two different audiences: (1) the executive team and (2) the data analytics team. How would your choice of data format (wide vs. long) and presentation style differ for each audience? Explain the reasoning behind your approach and how data reshaping enables better stakeholder communication.*

**Your Response:**
```
If I were presenting the results of a quarterly sales analysis to two different audiences, I would choose different data formats and presentation styles to match their needs. For the executive team, I would use wide format because it is easier for them to quickly scan and compare numbers side by side. For example, columns like Q1 Sales, Q2 Sales, and Q3 Sales give them a straightforward view of performance without requiring extra steps to interpret the trends. I would pair this format with charts and visuals that highlight growth, comparisons to targets, and key takeaways, since executives focus more on high-level insights and business impact. For the data analytics team, I would use long format since it is much better for technical analysis and modeling. A structure with one column for quarter and one column for sales makes it possible to easily run regressions, build time-series models, and create detailed visualizations in R. The analytics team would benefit from this because it allows them to dig deeper into patterns, run diagnostics, and uncover drivers of performance. Data reshaping is key because it makes it possible to deliver the same information in different ways depending on the stakeholder. Using the right format for the right audience not only improves clarity but also ensures that each group can act on the data effectively.
```

---

### **Question 5: Future Applications and Learning Transfer** üöÄ
*Identify three specific situations in your academic program or career field where you anticipate needing data reshaping skills. For each situation, explain: (a) what type of data you'd be working with, (b) what reshaping operations would be needed, (c) what business insights or decisions would result. How has this homework prepared you to handle these future challenges?*

**Your Response:**
```
One situation where I anticipate needing data reshaping skills is in my academic program when working on group projects that involve survey data. Often survey responses come in wide format with each question as a separate column. To analyze patterns across respondents or track changes in answers across time, I would need to reshape the data into long format using functions like pivot_longer(). This would make it possible to run statistical tests, build visualizations, and identify trends in student or consumer behavior. The business insight from this would be clearer comparisons across variables, which could lead to stronger recommendations and complete submissions in class projects.
A second situation would be in my career field of Venture Capital, where portfolio companies provide quarterly KPI reports. These reports usually arrive in wide format with columns for each quarter‚Äôs revenue, expenses, or churn. Converting the data into long format makes it possible to run time-series analysis, create growth charts, and benchmark performance across companies. This reshaping supports better investment decisions and clearer communication with limited partners who care about performance over time. 
A third situation could be in a professional setting where I am working with financial trading or energy supply data. Here, large datasets often record daily or hourly prices and volumes across different products. Reshaping from long to wide might be necessary if I need to create matrices or compare product performance side by side. The insight gained from this would guide trading strategies, supply chain adjustments, or investment recommendations. This homework has prepared me to handle these challenges by showing how different data formats connect with different types of analysis. It also showed me the importance of validation, efficiency, and good stakeholder communication, which are skills I know I will continue to use both in school and in my career.

```

---

### **Reflection Grading Rubric:**

| **Criteria** | **Excellent (4)** | **Proficient (3)** | **Developing (2)** | **Needs Improvement (1)** |
|--------------|-------------------|-------------------|-------------------|---------------------------|
| **Technical Understanding** | Demonstrates deep understanding of reshaping concepts and when to apply them | Shows good grasp of concepts with minor gaps | Basic understanding with some confusion | Limited understanding of concepts |
| **Business Application** | Clearly connects technical skills to real business scenarios and decisions | Makes relevant business connections with some detail | Basic business relevance identified | Weak connection to business applications |
| **Critical Thinking** | Provides thoughtful analysis and evaluation of approaches and outcomes | Shows some analysis and reflection on methods | Limited analysis or shallow reflection | Minimal critical thinking evident |
| **Communication** | Clear, professional writing with specific examples and evidence | Generally clear with adequate examples | Somewhat unclear or lacks specific examples | Poor communication or vague responses |
| **Learning Transfer** | Demonstrates ability to apply learning to new situations and identifies growth | Shows some ability to transfer learning | Limited evidence of learning transfer | No clear evidence of learning transfer |

**Total Points: _____ / 20**

---

### **Submission Instructions:**
- Complete all five reflection questions with thoughtful, detailed responses
- Use specific examples from the homework exercises to support your points
- Demonstrate understanding of both technical concepts and business applications
- Proofread your responses for clarity and professionalism
- Submit along with your completed homework notebook