# Part 2: viewing data

This workbook requires you to load the `` titanic`` and ``avocado`` datasets. You will also need to run the following block of code to import ``numpy`` and ``pandas``:

In [None]:
import pandas as pd
import numpy as np

Load the ``titanic`` and ``avocado`` data sets as a ``pandas`` dataframes in the code block below:

In [None]:
# Load the Titanic and avocado data sets
df_titanic = pd.read_excel("titanic.xlsx")
avocado = pd.read_excel("avocado.xlsx")

## Exploring datasets in more detail

Let's come back to our ``titanic`` example. We can access the index of the DataFrame as follows:


In [None]:
df_titanic.index

By default it is a pandas RangeIndex type, it works similarly to ``range`` it starts at 0, the last entry is stop - 1, the step is 1.

We may use different type of indexing, but for now we are going to use the default one.

How about displaying all the column names of our Data Frame?
We do it as follows:

In [None]:
df_titanic.columns

Index(['PassengerId', 'Name', 'Sex', 'Age', 'Ticket', 'Fare', 'Cabin',
       'Survived'],
      dtype='object')

This type can be treated as a list or numpy array, we can call its elements via an index.

In [None]:
df_titanic.columns[0], df_titanic.columns[-1]

('PassengerId', 'Survived')

We already know how to view values of a particular column, e.g.

``df_titanic["Survived"]``

If the name of the column does not contain spaces we can also view the values by

In [None]:
df_titanic.Survived

0      0
1      1
2      1
3      1
4      0
      ..
886    0
887    1
888    0
889    1
890    0
Name: Survived, Length: 891, dtype: int64

If we wish to get numpy array from the pd.Series we use ``your_pd_series.values``:

In [None]:
df_titanic.Survived.values

array([0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1,
       1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1,
       1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0,
       1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0,
       0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,

Pandas is very compatible with numpy, in fact, we can simply convert a DataFrame to a numpy array.

In [None]:
titanic_np_array = df_titanic.to_numpy()
print(titanic_np_array)

[[1 'Braund, Mr. Owen Harris' 'male' ... 7.25 nan 0]
 [2 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)' 'female' ...
  71.2833 'C85' 1]
 [3 'Heikkinen, Miss. Laina' 'female' ... 7.925 nan 1]
 ...
 [889 'Johnston, Miss. Catherine Helen "Carrie"' 'female' ... 23.45 nan 0]
 [890 'Behr, Mr. Karl Howell' 'male' ... 30.0 'C148' 1]
 [891 'Dooley, Mr. Patrick' 'male' ... 7.75 nan 0]]


We can also get a quick statistic summary of our data. This is done via:

In [None]:
df_titanic.describe()

Unnamed: 0,PassengerId,Age,Fare,Survived
count,891.0,714.0,891.0,891.0
mean,446.0,29.699118,32.204208,0.383838
std,257.353842,14.526497,49.693429,0.486592
min,1.0,0.42,0.0,0.0
25%,223.5,20.125,7.9104,0.0
50%,446.0,28.0,14.4542,0.0
75%,668.5,38.0,31.0,1.0
max,891.0,80.0,512.3292,1.0


Note that for pandas ``displaying`` produces nicer outputs than ``printing``

In [None]:
print(df_titanic.describe())

       PassengerId         Age        Fare    Survived
count   891.000000  714.000000  891.000000  891.000000
mean    446.000000   29.699118   32.204208    0.383838
std     257.353842   14.526497   49.693429    0.486592
min       1.000000    0.420000    0.000000    0.000000
25%     223.500000   20.125000    7.910400    0.000000
50%     446.000000   28.000000   14.454200    0.000000
75%     668.500000   38.000000   31.000000    1.000000
max     891.000000   80.000000  512.329200    1.000000


Displaying can be also achieved through display command as follows:

In [None]:
display(df_titanic.describe())

Unnamed: 0,PassengerId,Age,Fare,Survived
count,891.0,714.0,891.0,891.0
mean,446.0,29.699118,32.204208,0.383838
std,257.353842,14.526497,49.693429,0.486592
min,1.0,0.42,0.0,0.0
25%,223.5,20.125,7.9104,0.0
50%,446.0,28.0,14.4542,0.0
75%,668.5,38.0,31.0,1.0
max,891.0,80.0,512.3292,1.0


Note that the above stats are only for the numerical columns.

### Exercise 2.1

With the avocado data frame from Exercise 0.1:

Please display its columns.

Display the stats of this data frame:

Display all the entries of this data frame in the column ``Total Bags``.

Convert your data frame to a numpy array and then print it.

### Transposing your data

You may have heard about transposing operation. In matrices, a transpose swaps the rows with columns. This operation make sense with numpy arrays and Data Frames as well.

In [None]:
my_matrix = np.array([[1, 2], [3, 4]])
print(f"Original matrix \n {my_matrix}")
print(f"Transposed matrix \n {my_matrix.T}")

Original matrix 
 [[1 2]
 [3 4]]
Transposed matrix 
 [[1 3]
 [2 4]]


In [None]:
df_titanic_t = df_titanic.T
df_titanic_t

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890
PassengerId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891
Name,"Braund, Mr. Owen Harris","Cumings, Mrs. John Bradley (Florence Briggs Th...","Heikkinen, Miss. Laina","Futrelle, Mrs. Jacques Heath (Lily May Peel)","Allen, Mr. William Henry","Moran, Mr. James","McCarthy, Mr. Timothy J","Palsson, Master. Gosta Leonard","Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)","Nasser, Mrs. Nicholas (Adele Achem)","Sandstrom, Miss. Marguerite Rut","Bonnell, Miss. Elizabeth","Saundercock, Mr. William Henry","Andersson, Mr. Anders Johan","Vestrom, Miss. Hulda Amanda Adolfina","Hewlett, Mrs. (Mary D Kingcome)","Rice, Master. Eugene","Williams, Mr. Charles Eugene","Vander Planke, Mrs. Julius (Emelia Maria Vande...","Masselmani, Mrs. Fatima","Fynney, Mr. Joseph J","Beesley, Mr. Lawrence","McGowan, Miss. Anna ""Annie""","Sloper, Mr. William Thompson","Palsson, Miss. Torborg Danira","Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...","Emir, Mr. Farred Chehab","Fortune, Mr. Charles Alexander","O'Dwyer, Miss. Ellen ""Nellie""","Todoroff, Mr. Lalio","Uruchurtu, Don. Manuel E","Spencer, Mrs. William Augustus (Marie Eugenie)","Glynn, Miss. Mary Agatha","Wheadon, Mr. Edward H","Meyer, Mr. Edgar Joseph","Holverson, Mr. Alexander Oskar","Mamee, Mr. Hanna","Cann, Mr. Ernest Charles","Vander Planke, Miss. Augusta Maria","Nicola-Yarred, Miss. Jamila",...,"Svensson, Mr. Johan","Boulos, Miss. Nourelain","Lines, Miss. Mary Conover","Carter, Mrs. Ernest Courtenay (Lilian Hughes)","Aks, Mrs. Sam (Leah Rosen)","Wick, Mrs. George Dennick (Mary Hitchcock)","Daly, Mr. Peter Denis","Baclini, Mrs. Solomon (Latifa Qurban)","Razi, Mr. Raihed","Hansen, Mr. Claus Peter","Giles, Mr. Frederick Edward","Swift, Mrs. Frederick Joel (Margaret Welles Ba...","Sage, Miss. Dorothy Edith ""Dolly""","Gill, Mr. John William","Bystrom, Mrs. (Karolina)","Duran y More, Miss. Asuncion","Roebling, Mr. Washington Augustus II","van Melkebeke, Mr. Philemon","Johnson, Master. Harold Theodor","Balkic, Mr. Cerin","Beckwith, Mrs. Richard Leonard (Sallie Monypeny)","Carlsson, Mr. Frans Olof","Vander Cruyssen, Mr. Victor","Abelson, Mrs. Samuel (Hannah Wizosky)","Najib, Miss. Adele Kiamie ""Jane""","Gustafsson, Mr. Alfred Ossian","Petroff, Mr. Nedelio","Laleff, Mr. Kristo","Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)","Shelley, Mrs. William (Imanita Parrish Hall)","Markun, Mr. Johann","Dahlberg, Miss. Gerda Ulrika","Banfield, Mr. Frederick James","Sutehall, Mr. Henry Jr","Rice, Mrs. William (Margaret Norton)","Montvila, Rev. Juozas","Graham, Miss. Margaret Edith","Johnston, Miss. Catherine Helen ""Carrie""","Behr, Mr. Karl Howell","Dooley, Mr. Patrick"
Sex,male,female,female,female,male,male,male,male,female,female,female,female,male,male,female,female,male,male,female,female,male,male,female,male,female,female,male,male,female,male,male,female,female,male,male,male,male,male,female,female,...,male,female,female,female,female,female,male,female,male,male,male,female,female,male,female,female,male,male,male,male,female,male,male,female,female,male,male,male,female,female,male,female,male,male,female,male,female,female,male,male
Age,22,38,26,35,35,,54,2,27,14,4,58,20,39,14,55,2,,31,,35,34,15,28,8,38,,19,,,40,,,66,28,42,,21,18,14,...,74,9,16,44,18,45,51,24,,41,21,48,,24,42,27,31,,4,26,47,33,47,28,15,20,19,,56,25,33,22,28,25,39,27,19,,26,32
Ticket,A/5 21171,PC 17599,STON/O2. 3101282,113803,373450,330877,17463,349909,347742,237736,PP 9549,113783,A/5. 2151,347082,350406,248706,382652,244373,345763,2649,239865,248698,330923,113788,349909,347077,2631,19950,330959,349216,PC 17601,PC 17569,335677,C.A. 24579,PC 17604,113789,2677,A./5. 2152,345764,2651,...,347060,2678,PC 17592,244252,392091,36928,113055,2666,2629,350026,28134,17466,CA. 2343,233866,236852,SC/PARIS 2149,PC 17590,345777,347742,349248,11751,695,345765,P/PP 3381,2667,7534,349212,349217,11767,230433,349257,7552,C.A./SOTON 34068,SOTON/OQ 392076,382652,211536,112053,W./C. 6607,111369,370376
Fare,7.25,71.2833,7.925,53.1,8.05,8.4583,51.8625,21.075,11.1333,30.0708,16.7,26.55,8.05,31.275,7.8542,16,29.125,13,18,7.225,26,13,8.0292,35.5,21.075,31.3875,7.225,263,7.8792,7.8958,27.7208,146.521,7.75,10.5,82.1708,52,7.2292,8.05,18,11.2417,...,7.775,15.2458,39.4,26,9.35,164.867,26.55,19.2583,7.2292,14.1083,11.5,25.9292,69.55,13,13,13.8583,50.4958,9.5,11.1333,7.8958,52.5542,5,9,24,7.225,9.8458,7.8958,7.8958,83.1583,26,7.8958,10.5167,10.5,7.05,29.125,13,30,23.45,30,7.75
Cabin,,C85,,C123,,,E46,,,,G6,C103,,,,,,,,,,D56,,A6,,,,C23 C25 C27,,,,B78,,,,,,,,,...,,,D28,,,,E17,,,,,D17,,,,,A24,,,,D35,B51 B53 B55,,,,,,,C50,,,,,,,,B42,,C148,
Survived,0,1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,...,0,0,1,0,1,1,1,1,0,0,0,1,0,0,1,1,0,0,1,0,1,0,0,1,1,0,0,0,1,1,0,0,0,0,0,0,1,0,1,0


With this particular example it is not the best thing to do. However, we have got an interesting data frame. Let's spend some time on it.

### Exercise 2.2

Display the index of the above transposed titanic DataFrame.

Note that it starts with an index but its elements can be accessed as lists.