### Python | Pandas Series.str.slice()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas str.slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python’s basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.
Since this is a pandas string method, .str has to be prefixed every time before calling this method. Otherwise, it gives an error.

Syntax: Series.str.slice(start=None, stop=None, step=None)

Parameters:<br>
start: int value, tells where to start slicing<br>
stop: int value, tells where to end slicing<br>
step: int value, tells how much characters to step during slicing<br>

Return type: Series with sliced substrings

(https://www.geeksforgeeks.org/python-pandas-series-str-slice/)

In [1]:
import pandas as pd
import numpy as np

In [2]:
data1 = pd.read_csv(r'c:/Users/srini/OneDrive/kaggle/nba.csv')

In [3]:
# removing null values to avoid errors
data1.dropna(inplace=True)

In [5]:
start,stop,step = 0, -2, 1

In [6]:
data1['Salary']=data1['Salary'].astype(str)

In [7]:
# slicing till 2nd last element
data1['Salary(int)'] = data1['Salary'].str.slice(start,stop,step)

In [8]:
data1

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Salary(int)
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,7730337
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,6796117
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,1148640
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0,1170960
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0,2165160
...,...,...,...,...,...,...,...,...,...,...
449,Rodney Hood,Utah Jazz,5.0,SG,23.0,6-8,206.0,Duke,1348440.0,1348440
451,Chris Johnson,Utah Jazz,23.0,SF,26.0,6-6,206.0,Dayton,981348.0,981348
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0,2239800
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0,2433333


In [10]:
start = 0 
stop = -2
step = 2
data1['Name'] = data1['Name'].str.slice(start,stop,step)
data1

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Salary(int)
0,Ay,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,7730337
1,JC,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,6796117
3,R,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0,1148640
6,Ja,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0,1170960
7,Ky,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0,2165160
...,...,...,...,...,...,...,...,...,...,...
449,Re,Utah Jazz,5.0,SG,23.0,6-8,206.0,Duke,1348440.0,1348440
451,Cs,Utah Jazz,23.0,SF,26.0,6-6,206.0,Dayton,981348.0,981348
452,T,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0,2239800
453,Sv,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0,2433333


***How to take column-slices of DataFrame in Pandas?***

In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns.

https://www.geeksforgeeks.org/how-to-take-column-slices-of-dataframe-in-pandas/

In [11]:
data2 = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7], 
                    "b": [2, 3, 4, 2, 3, 4, 5], 
                    "c": [3, 4, 5, 2, 3, 4, 5], 
                    "d": [4, 5, 6, 2, 3, 4, 5], 
                    "e": [5, 6, 7, 2, 3, 4, 5]})
data2

Unnamed: 0,a,b,c,d,e
0,1,2,3,4,5
1,2,3,4,5,6
2,3,4,5,6,7
3,4,2,2,2,2
4,5,3,3,3,3
5,6,4,4,4,4
6,7,5,5,5,5


In [12]:
#Slice Columns in pandas using reindex
re = data2.reindex(columns=['c','d'])
re

Unnamed: 0,c,d
0,3,4
1,4,5
2,5,6
3,2,2
4,3,3
5,4,4
6,5,5


In [13]:
# Slice Columns in pandas using loc[]
l = data2.loc[:,"b":"d":2]
l

Unnamed: 0,b,d
0,2,4
1,3,5
2,4,6
3,2,2
4,3,3
5,4,4
6,5,5


In [9]:
l1 = data2.loc[:,"c":"e":1]
l1

Unnamed: 0,c,d,e
0,3,4,5
1,4,5,6
2,5,6,7
3,2,2,2
4,3,3,3
5,4,4,4
6,5,5,5


In [10]:
#Method 3: Slice Columns in pandas using iloc[]
il = data2.iloc[:,1:3:1]
il

Unnamed: 0,b,c
0,2,3
1,3,4
2,4,5
3,2,2
4,3,3
5,4,4
6,5,5


In [11]:
il1 = data2.iloc[:,0:3:2]
il1

Unnamed: 0,a,c
0,1,3
1,2,4
2,3,5
3,4,2
4,5,3
5,6,4
6,7,5


(https://www.geeksforgeeks.org/python-pandas-apply/)

***Python | Pandas.apply()***

Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine learning

In [15]:
#this didn't work neglect this 
data12 = pd.read_csv(r'c:/Users/srini/OneDrive/kaggle/nba.csv')
data12.dropna(inplace=True)
data_nba = pd.DataFrame(data12)
data_nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0


In [16]:
print(data_nba.dtypes)
data_nba['Salary'] = data_nba['Salary'].astype('int')


Name         object
Team         object
Number      float64
Position     object
Age         float64
Height       object
Weight      float64
College      object
Salary      float64
dtype: object


In [17]:
print(data_nba.dtypes)

Name         object
Team         object
Number      float64
Position     object
Age         float64
Height       object
Weight      float64
College      object
Salary        int32
dtype: object


In [18]:
data_salary_mean = data12['Salary'].mean()
data_salary_mean = data_salary_mean.round(2)
data_salary_mean

4620311.07

In [19]:
marks = [[99,88,90,100,56,89,100,45,89,97,95,80,85,67,69]]
s_dict = pd.DataFrame(marks)
s_dict.astype('int')
def cal_sal(num):
    if num <= 80:
        return 'C-Grade'
    elif num >= 85 and num <=  90:
        return 'B-Grade'
    else:
        return 'A-Grade'

In [77]:
new = s_dict.apply(cal_sal)
#park aside

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [22]:
df2 = pd.DataFrame({'A': [1, 2], 'B': [10, 20]})
def sq(num):
    return num * num
df3 = df2.apply(sq)
print("before",end='\n')
print(df2)
print("after")
print(df3)

before
   A   B
0  1  10
1  2  20
after
   A    B
0  1  100
1  4  400
