# Problem Statement from e-mail
Hi Ben, sorry to bother you again but I visited the link above but split seems to work only if a delimiter applies to all of my cells. Here is how my data looks like

column name: id
            20190608-17848
            20190701-11980
            11981
            67890
            20190701-11980

My goal is to update all the cells so that I end up with only the 5 last digits after the dash and the ones that don't have a dash and digits prior to the dash, like 11981, leave as is.
So I want to find cells with characters and dash and get rid of the characters preceding the dash and dash included.

When I use split, it splits my column only for the cells impacted, which is partially what I want, but the rest of the cells which don't have a dash are turned into NaN's, how do I prevent this from happening?
here is my code for split:
df[['idpredash', 'idmodified']] = df.id.str.split("-", expand = True)

In Excel I solve this by doing a Find and Replace and enter this for the Replace/Find What: *- and Replace With: nothing. This does what I need in Excel, but I want to do the same in Python.

# Load data and libraries

In [42]:
import pandas as pd

#Data coming from a .csv file
df_csv = pd.read_csv('data/Book1.csv')

#or Data coming from an Excel file
df_excel = pd.read_excel('data/Book1.xls')

#Display the DataFrames to insure they loaded correctly
display(df_csv)
display(df_excel)

Unnamed: 0,id
0,20190608-17848
1,20190701-11980
2,11981
3,67890
4,20190701-11982


Unnamed: 0,id
0,20190608-17848
1,20190701-11980
2,11981
3,67890
4,20190701-11982


**Since you only need the last five digits in each entry, we tell Python to only pull the last five digits. And as there are five digits after the dash, this will eliminate both the dash and the preceding datetime.** 

We then put these last five digits into a new column. 

In [45]:
#long winded way of doing it
iterator = 0

for entry in df_csv['id']:
    df_csv.loc[iterator, 'newid'] = str(entry)[-5:] #pull only the last five numbers in each entry
    iterator += 1


#Check our work
df_csv

Unnamed: 0,id,newid
0,20190608-17848,17848
1,20190701-11980,11980
2,11981,11981
3,67890,67890
4,20190701-11982,11982


In [46]:
#long winded way of doing it, but it can be done with Excel files also

iterator = 0

for entry in df_excel['id']:
    df_excel.loc[iterator, 'newid'] = str(entry)[-5:]#pull only the last five numbers in each entry
    iterator += 1


#Check our work
df_excel

Unnamed: 0,id,newid
0,20190608-17848,17848
1,20190701-11980,11980
2,11981,11981
3,67890,67890
4,20190701-11982,11982
