# D214 - Data Analytics Graduate Capstone
### Task 2: Data Analytics Report
___

### A: Research Question

In the international finance industry, the dynamics of the foreign exchange market (FX) pose a great challenge to firms operating globally and has become an integral factor in the financial decision-making process. The volatility associated with FX rates can have a significant impact on the profitability of a firm. Having a reliable, reasonably accurate forecast of a given currency rate can provide a significant advantage to a firm in terms of planning and decision-making. Moreover, the ability to forecast FX rates may be hold significant value in a company's risk management strategy.

The purpose of this study is to answer the proposed research question, "Is the ARIMA time series forecasting model capable of accurately predicting future foreign exchange rates?". The model will be evaluated based on its efficacy in predicting the future rate of several characteristically different currency rates. The ARIMA model will be trained on historical time series data and then tested on a holdout sample. The model will then be evaluated based on its ability to accurately predict the future rate of the currencies in the test set. 

Multinational corporations often face FX risk as a consequence of operating across multiple countries and dealing with various currencies. Many factors contribute to the composition and extent of a firm's FX risk exposure, but ultimately, it is the volatility and movement of the currencies in a firm’s portfolio that most directly impact profitability, competitiveness, and overall financial stability. Considering the multitude of available time series forecasting models, it seems appropriate to assess their predictive accuracy, particularly with respect to FX rates, so leaders are enabled to make data-driven decisions. By accurately predicting future FX rates, firms can optimize hedging strategies, enhance financial planning, and potentially realize significant cost savings. This, in turn, facilitates more strategic decision-making and bolsters the firm's ability to safeguard against unforeseen financial risks.

In this study, the effectiveness of the ARIMA model is tested against two specific hypotheses. The hypotheses are as follows:

- Null hypothesis: 
  - The mean average percentage error (MAPE) of the ARIMA model as applied to a 90-day forecast of future foreign exchange rates is greater than 20%
- Alternate Hypothesis: 
  - The mean average percentage error (MAPE) of the ARIMA model as applied to a 90-day forecast of future foreign exchange rates is less than 20% 

The null hypothesis predicts a mean average percentage error (MAPE) of more than 20% for the 90-day forecast, which would suggest inadequate precision. The alternate hypothesis anticipates a MAPE of less than 20%, reflecting a more favorable prediction accuracy. These hypotheses set the stage for the empirical testing that follows, enabling a detailed analysis of the ARIMA model's practical applicability in FX rate forecasting.

### B: Data Collection

This analysis will utilize a basket of daily FX spot rates for currencies against the US Dollar. As the behavior of FX rates varies somewhat dramatically depending on the pair, it’s important to evaluate model accuracy using currencies with significantly different macroeconomics.

The Federal Reserve Economic Data (FRED) Daily Exchange Rates datasets were used for the purposes of this analysis. Specifically, the FRED website refers to these rates as “H.10”. These data are highly reliable, widely used, and readily available records of daily FX spot rates for several currency pairs and dating back many years in most cases so as to provide a sufficient amount of data for training and testing. 

The Federal Reserve Bank of St. Louis owns the FRED data, which is publicly accessible for research and educational purposes. FRED permits the use of this data for academic research so long as the user cites FRED and provides a note stating where the data was obtained (as well as any copyright notices that may appear in the data).

In the data-collection process for following research, a wide range of time series data was collected for six currencies: GBP, CAD, CNY, JPY, INR, and ZAR, all against the US Dollar. This data was obtained for a 40-year period from January 1, 1983, to December 31, 2022, from the FRED database.

The data collected included daily foreign exchange rates for the specified currency pairs. The selected range allowed for an extensive historical analysis, with each record containing the date and the closing, mid-point exchange rate value for that day. The data was transformed into a format suitable for time-series analysis, including handling missing values and ensuring consistency across the different currency pairs.

The data collection method made use of a helpful API provided courtesy of the Federal Reserve's website. This programmatic access offered a significant advantage in terms of automation and precision. By utilizing a well-documented API, the process ensured that the collected data was consistent, up-to-date, and aligned with the specific requirements of the research question.

One limitation of this method was the reliance on the external API. This dependency on third-party services could lead to challenges such as unexpected changes in the API's structure, limitations on request frequency, or unavailability of specific series, potentially hindering the data collection process. FRED has been providing this API for many years, and it is widely used, so it is unlikely that any significant changes will occur. However, it is important to recognize the potential for such issues and to have a contingency plan in place to mitigate any adverse effects.

Several challenges were encountered during the data collection process, particularly related to error handling and missing data. Robust error handling techniques were implemented to catch any unexpected errors during data retrieval, allowing for informative error messages that assisted in diagnosing issues. As the H10 (daily currency rates) data handles missing values simply by leaving the value as a '.' (period), it was necessary to replace these values with a `NaN` value. This was achieved by using the pandas library to replace such values as the data was gathered.

The data-collection process was a critical first step in the research. As such, it was carried out with careful consideration of the advantages and potential pitfalls associated with the chosen methodology. By recognizing and overcoming the inherent challenges, the process successfully laid the groundwork for subsequent stages of the study, ensuring a comprehensive and reliable dataset for foreign exchange rate forecasting.

### C: Data Extraction and Preparation

The data-extraction process was implemented using Python's `requests` library to fetch data from the FRED database's API. The method employed involved constructing a specific URL and making an HTTP request to retrieve the relevant data series, as demonstrated in the following code snippet:

This approach allowed for automation, enabling easy and consistent extraction of data and ensuring accurate and up-to-date information. However, it was also sensitive to changes in the API or endpoint structure, which could pose a risk to the continuity of the extraction process.

Once the data was extracted, the next phase involved its transformation and preparation. This process was carried out using the `pandas` and `numpy` libraries, allowing for seamless transformation and cleaning. The extracted data was transformed into a pandas DataFrame, and any missing values were handled through forward and backward filling, as illustrated by the following code:

The choice of these libraries offered flexibility and efficiency, enhancing performance and speeding up the data preparation process. They were instrumental in moving from raw data to a clean and structured format suitable for analysis. Nonetheless, this process required careful attention to the unique characteristics of the data, adding a layer of complexity to the preparation phase.

The combination of Python libraries such as `requests`, `pandas`, and `numpy` was integral to the data extraction and preparation processes, offering a blend of automation, accuracy, adaptability, and efficiency. These tools facilitated a smooth transition from raw data to a format conducive to analysis, notwithstanding the need to manage the complexity of the data and dependencies on external services. The methodology adopted in this study exemplified a practical approach to handling large-scale data, underlining the importance of tool selection and methodological design in modern data science.

### D: Analysis

### E: Data Summary and Implications