# EDA 
* **Approach** 
    * EDA (Exploratory Data Analysis)
    * These questions and related questions are dealt with in this section. 
    * This section answers these questions and provides the necessary frame of reference for EDA assumptions, principles, and techniques.
    * Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to
 
        * Maximize insight into a data set;
        * Uncover underlying structure;
        * Extract important variables;
        * Detect outliers and anomalies;
        * Test underlying assumptions;
        * Develop parsimonious models; and
        * Determine optimal factor settings.
* **Focus:**
    * The EDA approach is precisely that--an approach--not a set of techniques, but an attitude/philosophy about how a data analysis should be carried out.
* **Philosophy**	
    * EDA is not identical to statistical graphics although the two terms are used almost interchangeably.
    * Statistical graphics(Hist,Scatter,line) is a collection of techniques--all graphically based and all focusing on one data characterization aspect.
    * EDA encompasses a larger venue; EDA is an approach to data analysis that postpones the usual assumptions about what kind of model the data follow with the more direct approach of allowing the data itself to reveal its underlying structure and model.
    * EDA is not a mere collection of techniques; EDA is a philosophy as to how we dissect a data set; what we look for; how we look; and how we interpret.
    * It is true that EDA heavily uses the collection of techniques that we call "statistical graphics", but it is not identical to statistical graphics per se.
* **Techniques**
    * Most EDA techniques are graphical in nature with a few **quantitative techniques**(Linear programming,Probability decision theory,Game theory,Queuing theory,Simulation,Network techniques).
    *  The reason for the heavy reliance on graphics is that by its very nature the main role of EDA is to open-mindedly explore, and graphics gives the analysts unparalleled power to do so, enticing the data to reveal its structural secrets, and being always ready to gain some new, often unsuspected, insight into the data.
    *  In combination with the natural pattern-recognition capabilities that we all possess, graphics provides, of course, unparalleled power to carry this out.
    *  The particular graphical techniques employed in EDA are often quite simple, consisting of various techniques of:
        *  Plotting the raw data (such as data traces, histograms, bihistograms, probability plots, lag plots, block plots, and Youden plots.
        *  Plotting simple statistics such as mean plots, standard deviation plots, box plots, and main effects plots of the raw data.
        *  Positioning such plots so as to maximize our natural pattern-recognition abilities, such as using multiple plots per page.

- What is exploratory data analysis? 
- How did it begin? 
- How and where did it originate? 
- How is it differentiated from other data analysis approaches, such as classical and Bayesian? 
- Is EDA the same as statistical graphics? 
- What role does statistical graphics play in EDA? 
- Is statistical graphics identical to EDA?

## How Does Exploratory Data Analysis differ from Classical Data Analysis?

1. **Data Analysis Approaches**
    1. EDA is a data analysis approach. What other data analysis approaches exist and how does EDA differ from these other approaches? Three popular data analysis approaches are:
        1. Classical
        2. Exploratory (EDA)
        3. Bayesian
2. **Paradigms for Analysis Techniques**
    1. These three approaches are similar in that they all start with a general science/engineering problem and all yield science/engineering conclusions. The difference is the sequence and focus of the intermediate steps.
        1. For classical analysis, the sequence is
          
                `Problem => Data => Model => Analysis => Conclusions`

        2. For EDA, the sequence is
        
                `Problem => Data => Analysis => Model => Conclusions`

        3. For Bayesian, the sequence is
                
                `Problem => Data => Model => Prior Distribution => Analysis => Conclusions`
                
3. **Method of dealing with underlying model for the data distinguishes the 3 approaches**
    1. Thus for classical analysis, the data collection is followed by the imposition of a model (normality, linearity, etc.) and the analysis, estimation, and testing that follows are focused on the parameters of that model. 
    2. For EDA, the data collection is not followed by a model imposition; rather it is followed immediately by analysis with a goal of inferring what model would be appropriate. 
    3. Finally, for a Bayesian analysis, the analyst attempts to incorporate scientific/engineering knowledge/expertise into the analysis by imposing a data-independent distribution on the parameters of the selected model; the analysis thus consists of formally combining both the prior distribution on the parameters and the collected data to jointly make inferences and/or test assumptions about the model parameters.
    4. In the real world, data analysts freely mix elements of all of the above three approaches (and other approaches). The above distinctions were made to emphasize the major differences among the three approaches.
   
4. **Model:**
    1. **Classical:**
        1. The classical approach imposes models (both deterministic and probabilistic) on the data.
        2. Deterministic models include, for example, **regression models** and **analysis of variance (ANOVA) models**.
        3. The most common probabilistic model assumes that the errors about the deterministic model are normally distributed--this assumption affects the validity of the ANOVA F tests.
        4. The two approaches differ substantially in focus. For classical analysis, the **focus is on the model--estimating parameters** of the **model and generating predicted values from the model**.
        5. Classical techniques are generally quantitative in nature. They include **ANOVA, t tests, chi-squared tests, and F tests.**
        6. 
    2. **Exploratory:**
        1. The Exploratory Data Analysis approach does not impose deterministic or probabilistic models on the data. 
        2. On the contrary, the EDA approach allows the data to suggest admissible models that best fit the data.
        3. For exploratory data analysis, the focus is on the data--**its structure, outliers, and models suggested by the data**.
        4. EDA techniques are generally graphical. They include **scatter plots, character plots, box plots, histograms, bihistograms, probability plots, residual plots, and mean plots**.
        5. 