# Tipos de datos en estadística descriptiva

En estadística, es crucial entender los diferentes tipos de datos que puedes encontrar. Los datos pueden ser cualitativos o cuantitativos, y cada uno tiene sus propios métodos de análisis.

## Datos cualitativos
Son datos que describen características o cualidades que no pueden ser medidas con números. Ejemplos: color de ojos, tipo de vivienda, marca de coche.

## Datos cuantitativos
Son datos que pueden ser medidos con números. Estos pueden ser de dos tipos: discretos y continuos. Los datos discretos son contables (ejemplo: número de hijos), mientras que los datos continuos pueden tomar cualquier valor dentro de un rango específico (ejemplo: peso, altura).

En este notebook, vamos a explorar estos conceptos utilizando la biblioteca Pandas en Python y un conjunto de datos integrado.

In [None]:
# Importing necessary libraries
import pandas as pd

# Loading the built-in iris dataset from seaborn
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

# Displaying the first few rows of the dataset
iris.head()

The dataset we are using is the famous Iris dataset. It contains measurements for 150 iris flowers from three different species.

The three classes in the Iris dataset are:

1. Iris-setosa (n=50)
2. Iris-versicolor (n=50)
3. Iris-virginica (n=50)

And the four features of in Iris dataset are:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm

Let's explore the data a bit more.

In [None]:
# Checking the shape of the dataset
iris.shape

(150, 5)

The dataset contains 150 rows and 5 columns. Each row corresponds to a single flower. The columns correspond to the features of the flower and its species.

Let's check the data types of the columns.

In [None]:
# Checking the data types of the columns
iris.dtypes

The features (sepal length, sepal width, petal length, petal width) are all of type float, and the species column is of type object, which is used for string or text data in pandas.

This means that our dataset contains both quantitative (the features) and qualitative (the species) data.

Let's do some basic data analysis.

In [None]:
# Checking the basic statistics of the quantitative data
iris.describe()

The `describe()` function provides a statistical summary of the quantitative data in the dataset. This includes the count, mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum.

Now, let's check the distribution of the qualitative data, i.e., the species.

In [None]:
# Checking the distribution of the species
iris['species'].value_counts()

The dataset is balanced, meaning there are equal numbers of samples from each species (50 each of setosa, versicolor, and virginica).

This concludes our basic exploration of the dataset. We've seen that it contains both qualitative and quantitative data, and we've examined the basic statistics of these features.