# Pandas Data Structure

## Creating your own data

### Series

`Series` == one-dimensional container, similar to Python `list`, except each element must be the same `dtype`

Represents each column of `DataFrame`.
`DataFrame` ~~ a dictionary of `Series` objects where key=column name, value=Series

In [40]:
import pandas as pd
import numpy as np
from collections import OrderedDict

In [41]:
s = pd.Series(['banana', 42])
s

0    banana
1        42
dtype: object

In [42]:
# Index can be assigned to the Series
# manually assign index values to a series
# by passing a Python list
s = pd.Series(['Wes McKinney', 'Creator of Pandas'], index=['Person', 'Who'])
s

Person         Wes McKinney
Who       Creator of Pandas
dtype: object

Question: What happens if `list`, `tuple`, `dict`, `numpy.ndarray` is used?

Answer: Works just fine

In [43]:
# Tuple t
t = ('Wes McKinney', 'Creator of Pandas')
i = ('Person', 'Who')
s = pd.Series(t, index=i)
s

Person         Wes McKinney
Who       Creator of Pandas
dtype: object

In [44]:
# Dict d
d = {
    'Person': 'Wes McKinney',
    'Who': 'Creator of Pandas'
}
s = pd.Series(d)
s

Person         Wes McKinney
Who       Creator of Pandas
dtype: object

In [45]:
# Numpy ndarray n
l = ['Wes McKinney', 'Creator of Pandas']
n = np.array(l)
s = pd.Series(n, index=np.array(['Person', 'Who']))
s

Person         Wes McKinney
Who       Creator of Pandas
dtype: object

Question: Does passing in an `index` when you use a `dict` overwrite the index? Or does it sort the values?

Answer: It overwrites the index

In [46]:
# Dict d
d = {
    'Person': 'Wes McKinney',
    'Who': 'Creator of Pandas'
}
i = ['Who', 'Person']
s = pd.Series(d, index=i)
s

Who       Creator of Pandas
Person         Wes McKinney
dtype: object

### DataFrame

`DataFrame` == dictionary of `Series` objects.
Where `key`=column name, `values`=contents of column

In [47]:
scientists = pd.DataFrame({
    'Name': ['Rosaline Franklin', 'William Gosset'],
    'Occupation': ['Chemist', 'Statistician'],
    'Born': ['1920-07-25', '1876-06-13'],
    'Died': ['1958-04-16', '1937-10-16'],
    'Age': [37, 61]
})
scientists

Unnamed: 0,Name,Occupation,Born,Died,Age
0,Rosaline Franklin,Chemist,1920-07-25,1958-04-16,37
1,William Gosset,Statistician,1876-06-13,1937-10-16,61


Notice: order is not guaranteed

In [48]:
scientists = pd.DataFrame({
    'Occupation': ['Chemist', 'Statistician'],
    'Born': ['1920-07-25', '1876-06-13'],
    'Died': ['1958-04-16', '1937-10-16'],
    'Age': [37, 61]
    },
    index=['Rosaline Franklin', 'William Gosset'],
    columns=['Occupation', 'Born', 'Died', 'Age'])
scientists

Unnamed: 0,Occupation,Born,Died,Age
Rosaline Franklin,Chemist,1920-07-25,1958-04-16,37
William Gosset,Statistician,1876-06-13,1937-10-16,61


In [49]:
# Using OrderedDict
# note the round brackets after OrderedDict
# then we pass a list of 2-tuples
scientists = pd.DataFrame(
    OrderedDict([
        ('Name', ['Rosaline Franklin', 'William Gosset']),
        ('Occupation', ['Chemist', 'Statistician']),
        ('Born', ['1920-07-25', '1876-06-13']),
        ('Died', ['1958-04-16', '1937-10-16']),
        ('Age', [37, 61])
    ])
)
scientists

Unnamed: 0,Name,Occupation,Born,Died,Age
0,Rosaline Franklin,Chemist,1920-07-25,1958-04-16,37
1,William Gosset,Statistician,1876-06-13,1937-10-16,61


## Series

In [50]:
# create example dataframe
scientists = pd.DataFrame(
    data={'Occupation': ['Chemist', 'Statistician'],
    'Born': ['1920-07-25', '1876-06-13'],
    'Died': ['1958-04-16', '1937-10-16'],
    'Age': [37, 61]},
    index=['Rosaline Franklin', 'William Gosset'],
    columns=['Occupation', 'Born', 'Died', 'Age']
)
scientists

Unnamed: 0,Occupation,Born,Died,Age
Rosaline Franklin,Chemist,1920-07-25,1958-04-16,37
William Gosset,Statistician,1876-06-13,1937-10-16,61


In [51]:
# select by row index label
first_row = scientists.loc['William Gosset']
type(first_row)

pandas.core.series.Series

In [52]:
first_row

Occupation    Statistician
Born            1876-06-13
Died            1937-10-16
Age                     61
Name: William Gosset, dtype: object

If we use the `loc` attribute to subset the first row of our `scientists` dataframe, we will get a `Series` object back.

In [53]:
first_row.index

Index(['Occupation', 'Born', 'Died', 'Age'], dtype='object')

In [54]:
first_row.values

array(['Statistician', '1876-06-13', '1937-10-16', 61], dtype=object)

In [55]:
# keys() => index
first_row.keys()

Index(['Occupation', 'Born', 'Died', 'Age'], dtype='object')

In [56]:
first_row.index[0]

'Occupation'

In [57]:
first_row.keys()[0]

'Occupation'

### `pandas.Series` is `numpy.ndarray`-like

`pandas.Series` is very similar to `numpy.ndarray`
Often referred to as a "vector"

In [58]:
ages = scientists['Age']
ages

Rosaline Franklin    37
William Gosset       61
Name: Age, dtype: int64

In [59]:
ages.mean()

49.0

In [60]:
ages.min()

37

In [61]:
ages.max()

61

In [62]:
ages.std()

16.97056274847714

In [63]:
ages.describe()

count     2.000000
mean     49.000000
std      16.970563
min      37.000000
25%      43.000000
50%      49.000000
75%      55.000000
max      61.000000
Name: Age, dtype: float64

In [64]:
scientists.transpose() is scientists

False

In [65]:
t is ages

False

Some of the Methods that can be performed on a Series

| func                         | desc                                            |
|------------------------------|-------------------------------------------------|
| append                       | Concatenates 2+ series                          |
| corr                         | Calc correlation with another Series            |
| cov                          | Calc covar with another Series                  |
| describe                     | Summary statistics                              |
| drop_duplicates              | Returns a copy without duplicates               |
| equals                       | compare two Series                              |
| values                       | get values of the Series                        |
| hist                         | draw histogram                                  |
| min, max, mean, median, mode | self-explanatory                                |
| quantile                     | returns value at a given quantile (0<=q<=1)     |
| replace                      | replace values in Series with a specified value |
| sample                       | return a random sample of values from Series    |
| sort_values                  | sort values                                     |
| to_frame                     | convert to DataFrame                            |
| transpose                    | Returns the transpose                           |
| unique                       | returns a `numpy.ndarray` of unique values      |

### Boolean Subsetting

In [66]:
path = '../data/scientists.csv'
scientists = pd.read_csv(path)
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [67]:
ages = scientists['Age']
ages

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [68]:
ages.describe()

count     8.000000
mean     59.125000
std      18.325918
min      37.000000
25%      44.000000
50%      58.500000
75%      68.750000
max      90.000000
Name: Age, dtype: float64

In [69]:
ages.mean()

59.125

In [70]:
ages > ages.mean()

0    False
1     True
2     True
3     True
4    False
5    False
6    False
7     True
Name: Age, dtype: bool

In [71]:
ages[ages > ages.mean()]

1    61
2    90
3    66
7    77
Name: Age, dtype: int64

### Operations are automatically aligned and vectorized

If you perform an operation between two vectors of the same length, the resulting vector will be an element-by-element calculation of the vectors

In [72]:
ages

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [73]:
ages + ages

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

In [74]:
ages * ages

0    1369
1    3721
2    8100
3    4356
4    3136
5    2025
6    1681
7    5929
Name: Age, dtype: int64

vector + scalar --> scalar will be recycled across all the elements in the vector

In [75]:
ages + 100

0    137
1    161
2    190
3    166
4    156
5    145
6    141
7    177
Name: Age, dtype: int64

In [76]:
ages * 2

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

When performing operations on vectors of different lengths, the behavior will depend on the `type` of the vectors.
With a `Series`, the vectors will perform an operation matched by the index, rest will be filled with `NaN`

In [77]:
ages + pd.Series([1, 100])

0     38.0
1    161.0
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
dtype: float64

`ages + np.array([1, 100])`

```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_20824/1930795539.py in <module>
----> 1 ages + np.array([1, 100])

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/arraylike.py in __add__(self, other)
     90     @unpack_zerodim_and_defer("__add__")
     91     def __add__(self, other):
---> 92         return self._arith_method(other, operator.add)
     93 
     94     @unpack_zerodim_and_defer("__radd__")

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/series.py in _arith_method(self, other, op)
   5524 
   5525         with np.errstate(all="ignore"):
-> 5526             result = ops.arithmetic_op(lvalues, rvalues, op)
   5527 
   5528         return self._construct_result(result, name=res_name)

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
    222         _bool_arith_check(op, left, right)
    223 
--> 224         res_values = _na_arithmetic_op(left, right, op)
    225 
    226     return res_values

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/ops/array_ops.py in _na_arithmetic_op(left, right, op, is_cmp)
    164 
    165     try:
--> 166         result = func(left, right)
    167     except TypeError:
    168         if is_object_dtype(left) or is_object_dtype(right) and not is_cmp:

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/computation/expressions.py in evaluate(op, a, b, use_numexpr)
    237         if use_numexpr:
    238             # error: "None" not callable
--> 239             return _evaluate(op, op_str, a, b)  # type: ignore[misc]
    240     return _evaluate_standard(op, op_str, a, b)
    241 

~/dsktlab-intake/venv/lib/python3.8/site-packages/pandas/core/computation/expressions.py in _evaluate_standard(op, op_str, a, b)
     67     if _TEST_MODE:
     68         _store_test_result(False)
---> 69     return op(a, b)
     70 
     71 

ValueError: operands could not be broadcast together with shapes (8,) (2,) 
```

Vectors with common index labels

In [78]:
ages

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [79]:
rev_ages = ages.sort_index(ascending=False)
rev_ages

7    77
6    41
5    45
4    56
3    66
2    90
1    61
0    37
Name: Age, dtype: int64

In [80]:
ages * 2

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

In [81]:
ages + rev_ages

0     74
1    122
2    180
3    132
4    112
5     90
6     82
7    154
Name: Age, dtype: int64

## DataFrame

Boolean Subsetting: DataFrames

In [82]:
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [83]:
scientists[scientists['Age'] > scientists['Age'].mean()]

Unnamed: 0,Name,Born,Died,Age,Occupation
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [84]:
scientists[['Name', 'Age']].loc[[0, 3]]

Unnamed: 0,Name,Age
0,Rosaline Franklin,37
3,Marie Curie,66


In [85]:
first_half = scientists[:4]
second_half = scientists[4:]

In [86]:
first_half

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist
1,William Gosset,1876-06-13,1937-10-16,61,Statistician
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist


In [87]:
second_half

Unnamed: 0,Name,Born,Died,Age,Occupation
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist
5,John Snow,1813-03-15,1858-06-16,45,Physician
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician


In [88]:
scientists * 2

Unnamed: 0,Name,Born,Died,Age,Occupation
0,Rosaline FranklinRosaline Franklin,1920-07-251920-07-25,1958-04-161958-04-16,74,ChemistChemist
1,William GossetWilliam Gosset,1876-06-131876-06-13,1937-10-161937-10-16,122,StatisticianStatistician
2,Florence NightingaleFlorence Nightingale,1820-05-121820-05-12,1910-08-131910-08-13,180,NurseNurse
3,Marie CurieMarie Curie,1867-11-071867-11-07,1934-07-041934-07-04,132,ChemistChemist
4,Rachel CarsonRachel Carson,1907-05-271907-05-27,1964-04-141964-04-14,112,BiologistBiologist
5,John SnowJohn Snow,1813-03-151813-03-15,1858-06-161858-06-16,90,PhysicianPhysician
6,Alan TuringAlan Turing,1912-06-231912-06-23,1954-06-071954-06-07,82,Computer ScientistComputer Scientist
7,Johann GaussJohann Gauss,1777-04-301777-04-30,1855-02-231855-02-23,154,MathematicianMathematician


### Making Changes to Series and DataFrames

Create new `datetime` column 

In [89]:
scientists['Born'].dtype

dtype('O')

In [90]:
scientists['Died'].dtype

dtype('O')

In [91]:
# format 'Born' column as a datetime
born_datetime = pd.to_datetime(scientists['Born'], format='%Y-%m-%d')
born_datetime

0   1920-07-25
1   1876-06-13
2   1820-05-12
3   1867-11-07
4   1907-05-27
5   1813-03-15
6   1912-06-23
7   1777-04-30
Name: Born, dtype: datetime64[ns]

In [92]:
# format 'Died' column as a datetime
died_datetime = pd.to_datetime(scientists['Died'], format='%Y-%m-%d')
died_datetime

0   1958-04-16
1   1937-10-16
2   1910-08-13
3   1934-07-04
4   1964-04-14
5   1858-06-16
6   1954-06-07
7   1855-02-23
Name: Died, dtype: datetime64[ns]

Assigning the newly created Series as new columns in DataFrame

In [93]:
scientists['born_dt'], scientists['died_dt'] = (born_datetime, died_datetime)
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,37,Chemist,1920-07-25,1958-04-16
1,William Gosset,1876-06-13,1937-10-16,61,Statistician,1876-06-13,1937-10-16
2,Florence Nightingale,1820-05-12,1910-08-13,90,Nurse,1820-05-12,1910-08-13
3,Marie Curie,1867-11-07,1934-07-04,66,Chemist,1867-11-07,1934-07-04
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14
5,John Snow,1813-03-15,1858-06-16,45,Physician,1813-03-15,1858-06-16
6,Alan Turing,1912-06-23,1954-06-07,41,Computer Scientist,1912-06-23,1954-06-07
7,Johann Gauss,1777-04-30,1855-02-23,77,Mathematician,1777-04-30,1855-02-23


In [94]:
scientists.shape

(8, 7)

Directly change a column

In [95]:
scientists['Age']

0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

In [96]:
import random

random.seed(42)
random.shuffle(scientists['Age'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x[i], x[j] = x[j], x[i]


In [97]:
scientists['Age']

0    66
1    56
2    41
3    77
4    90
5    45
6    37
7    61
Name: Age, dtype: int64

In [98]:
scientists['Age'] = scientists['Age'].\
    sample(len(scientists['Age']), random_state=24).\
    reset_index(drop=True)

# Requires reset_index, because sample() keeps row index and and values will re-align to index 
# and order themselves back to pre-sample order

`random.shuffle` work directly on column, shuffle 'in-place'
`sample` assign calculated values to seperate variable before re-assigning to DFcolumn

In [99]:
# See #/Series/Operations are automatically aligned and vectorized
scientists['age_days_dt'] = (scientists['died_dt'] - scientists['born_dt'])
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days
2,Florence Nightingale,1820-05-12,1910-08-13,37,Nurse,1820-05-12,1910-08-13,32964 days
3,Marie Curie,1867-11-07,1934-07-04,90,Chemist,1867-11-07,1934-07-04,24345 days
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14,20777 days
5,John Snow,1813-03-15,1858-06-16,66,Physician,1813-03-15,1858-06-16,16529 days
6,Alan Turing,1912-06-23,1954-06-07,77,Computer Scientist,1912-06-23,1954-06-07,15324 days
7,Johann Gauss,1777-04-30,1855-02-23,41,Mathematician,1777-04-30,1855-02-23,28422 days


In [100]:
# convert the value to just the year using astype method
scientists['age_years_dt'] = scientists['age_days_dt'].astype('timedelta64[Y]')
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days_dt,age_years_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days,37.0
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days,61.0
2,Florence Nightingale,1820-05-12,1910-08-13,37,Nurse,1820-05-12,1910-08-13,32964 days,90.0
3,Marie Curie,1867-11-07,1934-07-04,90,Chemist,1867-11-07,1934-07-04,24345 days,66.0
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14,20777 days,56.0
5,John Snow,1813-03-15,1858-06-16,66,Physician,1813-03-15,1858-06-16,16529 days,45.0
6,Alan Turing,1912-06-23,1954-06-07,77,Computer Scientist,1912-06-23,1954-06-07,15324 days,41.0
7,Johann Gauss,1777-04-30,1855-02-23,41,Mathematician,1777-04-30,1855-02-23,28422 days,77.0


### Dropping Values

To drop column:

  - select column using subsetting techniques, or
  - select using drop() method

In [101]:
scientists.columns

Index(['Name', 'Born', 'Died', 'Age', 'Occupation', 'born_dt', 'died_dt',
       'age_days_dt', 'age_years_dt'],
      dtype='object')

In [102]:
# drop the shuffled age column
# axis=1 argument to drop column-wise
scientists_dropped = scientists.drop(['Age'], axis=1)

scientists_dropped.columns

Index(['Name', 'Born', 'Died', 'Occupation', 'born_dt', 'died_dt',
       'age_days_dt', 'age_years_dt'],
      dtype='object')

## Export and Import

### `pickle` -- binary format

#### Series

In [103]:
names = scientists['Name']
names

0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object

In [104]:
names.to_pickle('../output/scientists_name_series.pickle')

#### DataFrame

In [105]:
scientists.to_pickle('../output/scientists_df.pickle')

#### Reading from pickle data

In [106]:
names_from_pickle = pd.read_pickle('../output/scientists_name_series.pickle')
names_from_pickle

0       Rosaline Franklin
1          William Gosset
2    Florence Nightingale
3             Marie Curie
4           Rachel Carson
5               John Snow
6             Alan Turing
7            Johann Gauss
Name: Name, dtype: object

In [107]:
scientists_from_pickle = pd.read_pickle('../output/scientists_df.pickle')
scientists_from_pickle

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days_dt,age_years_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days,37.0
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days,61.0
2,Florence Nightingale,1820-05-12,1910-08-13,37,Nurse,1820-05-12,1910-08-13,32964 days,90.0
3,Marie Curie,1867-11-07,1934-07-04,90,Chemist,1867-11-07,1934-07-04,24345 days,66.0
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14,20777 days,56.0
5,John Snow,1813-03-15,1858-06-16,66,Physician,1813-03-15,1858-06-16,16529 days,45.0
6,Alan Turing,1912-06-23,1954-06-07,77,Computer Scientist,1912-06-23,1954-06-07,15324 days,41.0
7,Johann Gauss,1777-04-30,1855-02-23,41,Mathematician,1777-04-30,1855-02-23,28422 days,77.0


### CSV

In [108]:
names.to_csv('../output/scientist_names_series.csv')

In [109]:
scientists.to_csv('../output/scientists_df.tsv', sep='\t')

Removing Row Index from output

In [110]:
# do not write the row index in the csv output
scientists.to_csv('../output/scientists_df_no_index.csv', index=False)

##### Importing CSV

`pandas.read_csv`

### Excel

`Series` data structure does not have `to_excel` method. Requiring conversion to `DataFrame` before exporting to Excel

In [111]:
# convert Series to DataFrame
names_df = names.to_frame()

In [113]:
import xlwt 

# xlwt is no longer maintained and will be removed in future version of pandas
# Use openpyxl and xlsx file instead
names_df.to_excel('../output/scientists_names_series_df.xls')

  names_df.to_excel('../output/scientists_names_series_df.xls')


In [115]:
import openpyxl
names_df.to_excel('../output/scientists_names_series_df.xlsx')

In [117]:
scientists.to_excel('../output/scientists_df.xlsx',
    sheet_name='scientists',
    index=False)
scientists

Unnamed: 0,Name,Born,Died,Age,Occupation,born_dt,died_dt,age_days_dt,age_years_dt
0,Rosaline Franklin,1920-07-25,1958-04-16,61,Chemist,1920-07-25,1958-04-16,13779 days,37.0
1,William Gosset,1876-06-13,1937-10-16,45,Statistician,1876-06-13,1937-10-16,22404 days,61.0
2,Florence Nightingale,1820-05-12,1910-08-13,37,Nurse,1820-05-12,1910-08-13,32964 days,90.0
3,Marie Curie,1867-11-07,1934-07-04,90,Chemist,1867-11-07,1934-07-04,24345 days,66.0
4,Rachel Carson,1907-05-27,1964-04-14,56,Biologist,1907-05-27,1964-04-14,20777 days,56.0
5,John Snow,1813-03-15,1858-06-16,66,Physician,1813-03-15,1858-06-16,16529 days,45.0
6,Alan Turing,1912-06-23,1954-06-07,77,Computer Scientist,1912-06-23,1954-06-07,15324 days,41.0
7,Johann Gauss,1777-04-30,1855-02-23,41,Mathematician,1777-04-30,1855-02-23,28422 days,77.0
