# Basics - Removing and Adding Data
let's mix up a bit and change our data source. Now we'll look at some astronauts! \
https://www.kaggle.com/nasa/astronaut-yearbook

In [4]:
import pandas as pd

df = pd.read_csv(r'astronauts.csv')
df.head(1)

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Rank,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,


## Modifying Type of Columns
Common for time series, categoricals, or converting strings to numeric

### Timeseries
Note we'll TimeSeries with a lot of depth in a later chapter, this is mostly a (useful) intro.

In [5]:
birth_date = pd.to_datetime(df['Birth Date'], format="%m/%d/%Y")
birth_date

0     1967-05-17
1     1936-03-07
2     1946-03-03
3     1951-05-20
4     1930-01-20
         ...    
352   1956-08-23
353   1962-07-26
354   1932-02-07
355   1930-09-24
356   1962-06-29
Name: Birth Date, Length: 357, dtype: datetime64[ns]

In [6]:
zarya = pd.to_datetime('1998-11-20', format="%Y-%m-%d")
df['age_at_zarya'] = ((zarya - birth_date).dt.days / 365).to_numpy().astype('int')
df.head(2)

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Rank,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission,age_at_zarya
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,,31
1,Loren W. Acton,,,Retired,3/7/1936,"Lewiston, MT",Male,Montana State University; University of Colorado,Engineering Physics,Solar Physics,,,1,190,0,0.0,STS 51-F (Challenger),,,62


In [7]:
df['birth'] = birth_date
df.head(2)

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,...,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission,age_at_zarya,birth
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,...,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,,31,1967-05-17
1,Loren W. Acton,,,Retired,3/7/1936,"Lewiston, MT",Male,Montana State University; University of Colorado,Engineering Physics,Solar Physics,...,,1,190,0,0.0,STS 51-F (Challenger),,,62,1936-03-07


In [8]:
print(pd.__version__)

2.2.2


## Categoricals
Why use? Information can be utilised by other libraries that pandas interfaces with, you can provide explicit sorting order rather than order, and huge speed improvements if you groups on categories.

In [9]:
from datetime import datetime

In [10]:
df['Military Rank'].unique()

array([nan, 'Colonel', 'Lieutenant Colonel', 'Captain', 'Major General',
       'Commander', 'Lieutenant Commander', 'Brigadier General', 'Major',
       'Lieutenant General', 'Chief Warrant Officer', 'Rear Admiral',
       'Vice Admiral'], dtype=object)

In [11]:
df['Military Rank'].dtype

dtype('O')

In [12]:
df['Military Rank'] = df['Military Rank'].astype('category')
df['Military Rank'].dtype

CategoricalDtype(categories=['Brigadier General', 'Captain', 'Chief Warrant Officer',
                  'Colonel', 'Commander', 'Lieutenant Colonel',
                  'Lieutenant Commander', 'Lieutenant General', 'Major',
                  'Major General', 'Rear Admiral', 'Vice Admiral'],
, ordered=False, categories_dtype=object)

In [10]:
# pd.Categorical(df['Military Rank'])

## Numeric / String Conversion

In [13]:
df.head(1)

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,...,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission,age_at_zarya,birth
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,...,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,,31,1967-05-17


In [14]:
df.age_at_zarya.astype(str).astype('int')[0]

np.int64(31)

## Removing Columns or Rows

In [15]:
df2 = df[['Name', 'Year', 'Group']]
df2.head(2)

Unnamed: 0,Name,Year,Group
0,Joseph M. Acaba,2004.0,19.0
1,Loren W. Acton,,


In [16]:
df2.drop('Group', axis=1).head(2)

Unnamed: 0,Name,Year
0,Joseph M. Acaba,2004.0
1,Loren W. Acton,


In [17]:
df2.drop(1).head(2)

Unnamed: 0,Name,Year,Group
0,Joseph M. Acaba,2004.0,19.0
2,James C. Adamson,1984.0,10.0


In [18]:
df2.drop(columns='Group').head(2)

Unnamed: 0,Name,Year
0,Joseph M. Acaba,2004.0
1,Loren W. Acton,


## Adding rows
I would have loved to be an astronaut. Alas Australia had no space program when i was kid.

In [19]:

# dt = pd.DataFrame(data={'Name':'Samuel Hinton', 'Year': 2010, 'Group': 20.0}, index=df2)
# df2.loc[len(df2), ['Name', 'Year', 'Group']] =  ('Samuel Hinton', 2010, 20.0)
df2 = pd.concat([df2, pd.DataFrame.from_records([{'Name':'Samuel Hinton', 'Year': 2010, 'Group': 20.0}])],
          ignore_index=True)


In [20]:
df2.tail()

Unnamed: 0,Name,Year,Group
353,Neil W. Woodward III,1998.0,17.0
354,Alfred M. Worden,1966.0,5.0
355,John W. Young,1962.0,2.0
356,George D. Zamka,1998.0,17.0
357,Samuel Hinton,2010.0,20.0


In [21]:
df_sis = pd.DataFrame({'Name': ['Al Hinton'], 'Year': [2010], 'Group': [20.0]})
df_sis

Unnamed: 0,Name,Year,Group
0,Al Hinton,2010,20.0


In [22]:
df2 = pd.concat([df2, df_sis], ignore_index=True)

In [33]:
df2.tail()

Unnamed: 0,Name,Year,Group
354,Alfred M. Worden,1966.0,5.0
355,John W. Young,1962.0,2.0
356,George D. Zamka,1998.0,17.0
357,Samuel Hinton,2010.0,20.0
358,Al Hinton,2010.0,20.0


## Adding Columns
What if I want it at a spesific location? sort_values

In [42]:
df2.drop('FirstName', axis=1, inplace=True)

In [43]:
df2['col1'] = 'Whoa'
df2.head(2)

Unnamed: 0,Name,Year,Group,col1
0,Joseph M. Acaba,2004.0,19.0,Whoa
1,Loren W. Acton,,,Whoa


In [59]:
# df2.loc[df2.index].Name.str.split(' ', expand=True)[0]
df2.insert(0, 'FirstName', df2.Name.str.split(' ', expand=True)[0])

In [60]:
df2.head(3)

Unnamed: 0,FirstName,Name,Year,Group,col1
0,Joseph,Joseph M. Acaba,2004.0,19.0,Whoa
1,Loren,Loren W. Acton,,,Whoa
2,James,James C. Adamson,1984.0,10.0,Whoa


In [55]:
df2.Name.str.split(' ', expand=True)

Unnamed: 0,0,1,2,3,4
0,Joseph,M.,Acaba,,
1,Loren,W.,Acton,,
2,James,C.,Adamson,,
3,Thomas,D.,Akers,,
4,Buzz,Aldrin,,,
...,...,...,...,...,...
354,Alfred,M.,Worden,,
355,John,W.,Young,,
356,George,D.,Zamka,,
357,Samuel,Hinton,,,


## I want to work with rows/columns and i have columns/rows

In [61]:
df3 = df.set_index('Name')
df3.head(2)

Unnamed: 0_level_0,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Rank,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission,age_at_zarya,birth
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,,31,1967-05-17
Loren W. Acton,,,Retired,3/7/1936,"Lewiston, MT",Male,Montana State University; University of Colorado,Engineering Physics,Solar Physics,,,1,190,0,0.0,STS 51-F (Challenger),,,62,1936-03-07


In [64]:
df3.T

Name,Joseph M. Acaba,Loren W. Acton,James C. Adamson,Thomas D. Akers,Buzz Aldrin,Andrew M. Allen,Joseph P. Allen,Scott D. Altman,William A. Anders,Clayton C. Anderson,...,Sunita L. Williams,Barry E. Wilmore,Stephanie D. Wilson,G. Reid Wiseman,Peter J. K. Wisoff,David A. Wolf,Neil W. Woodward III,Alfred M. Worden,John W. Young,George D. Zamka
Year,2004.0,,1984.0,1987.0,1963.0,1987.0,1967.0,1995.0,1963.0,1998.0,...,1998.0,2000.0,1996.0,2009.0,1990.0,1990.0,1998.0,1966.0,1962.0,1998.0
Group,19.0,,10.0,12.0,3.0,12.0,6.0,15.0,3.0,17.0,...,17.0,18.0,16.0,20.0,13.0,13.0,17.0,5.0,2.0,17.0
Status,Active,Retired,Retired,Retired,Retired,Retired,Retired,Retired,Retired,Retired,...,Active,Active,Active,Active,Retired,Retired,Retired,Retired,Retired,Retired
Birth Date,5/17/1967,3/7/1936,3/3/1946,5/20/1951,1/20/1930,8/4/1955,6/27/1937,8/15/1959,10/17/1933,2/23/1959,...,9/19/1965,12/29/1962,9/27/1966,11/11/1975,8/16/1958,8/23/1956,7/26/1962,2/7/1932,9/24/1930,6/29/1962
Birth Place,"Inglewood, CA","Lewiston, MT","Warsaw, NY","St. Louis, MO","Montclair, NJ","Philadelphia, PA","Crawsfordsville, IN","Lincoln, IL",Hong Kong,"Omaha, NE",...,"Euclid, OH","Murfreesboro, TN","Boston, MA","Baltimore, MD","Norfolk, VA","Indianapolis, IN","Chicago, IL","Jackson, MI","San Francisco, CA","Jersey City, NJ"
Gender,Male,Male,Male,Male,Male,Male,Male,Male,Male,Male,...,Female,Male,Female,Male,Male,Male,Male,Male,Male,Male
Alma Mater,University of California-Santa Barbara; Univer...,Montana State University; University of Colorado,US Military Academy; Princeton University,University of Missouri-Rolla,US Military Academy; MIT,Villanova University; University of Florida,DePauw University; Yale University,University of Illinois; US Naval Postgraduate ...,US Naval Academy; Air Force Institute of Techn...,Hastings College; Iowa State University,...,US Naval Academy; Florida Institute of Technology,Tennessee Technological University; University...,Harvard University; University of Texas,Rensselaer Polytechnic Institute; Johns Hopkin...,University of Virginia; Stanford University,Purdue University; Indiana University,MIT; University of Texas-Austin; George Washin...,US Military Academy; University of Michigan,Georgia Institute of Technology,US Naval Academy; Florida Institute of Technology
Undergraduate Major,Geology,Engineering Physics,Engineering,Applied Mathematics,Mechanical Engineering,Mechanical Engineering,Mathematics & Physics,Aeronautical & Astronautical Engineering,Nuclear Engineering,Physics,...,Physical Science,Electrical Engineering,Engineering Science,Computer & Systems Engineering,Physics,Electrical Engineering,Physics,Military Science,Aeronautical Engineering,Mathematics
Graduate Major,Geology,Solar Physics,Aerospace Engineering,Applied Mathematics,Astronautics,Business Administration,Physics,Aeronautical Engineering,Nuclear Engineering,Aerospace Engineering,...,Engineering Management,Electrical Engineering; Aviation Systems,Aerospace Engineering,Systems Engineering,Applied Physics,Medicine,Physics; Business Management,Aeronautical & Astronautical Engineering,,Engineering Management
Military Rank,,,Colonel,Colonel,Colonel,Lieutenant Colonel,,Captain,Major General,,...,Captain,Captain,,Commander,,,Commander,Colonel,Captain,Colonel


## Recap
