<a href="https://colab.research.google.com/github/rocks2021/rockspython/blob/main/02_data_types_and_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Data Types & Variables
## Variables
* A variable is created when you assign a value to it.
* A variable can be seen as a container to store certain values.
* Variable names can contain both letters and numbers but they cannot begin with a number.
* The underscore _ can appear in a name.
* Python's keywords cannot be used as variables names.
* In Python, variables do not need to be declared with any particular type, and can even change type after they have been set.


In [1]:
msg = 'Apple' # A new variable is named msg. A string is assinged to msg.

In [2]:
n = 7 # A new variable is named n. The integer 7 is assigned to n.

In [3]:
x = int(7)
x

7

In [4]:
y = float(17)
y

17.0

In [5]:
z = str(27)
z

'27'

In [6]:
temp_c = 11
print('the temperature in Celsius:', temp_c)

the temperature in Celsius: 11


## Built-in Data Types:
Use the type() function to get the data type of any object.

Variables can be converted from one type to another with the int(), float(), and str().

Variables can store data of different types, and different types can do different things. Python has the following data types:
* Numerical Type: int, float, complex
* Text Type: str
* Sequence Type: list, tuple, range
* Boolean Type: bool
* ...


In [7]:
type(227)

int

In [8]:
type(227.0)

float

In [9]:
type('227')

str

In [10]:
a=7,000 # Don't use commas between digits. This is a comma-seperated sequence of integers.
a

(7, 0)

In [11]:
type(a)

tuple

In [12]:
b = ['apple','orange','pear']
type(b)

list

In [13]:
c = ('apple','orange','pear')
type(c)

tuple

In [14]:
n = True
type(n)

bool

In [15]:
t = 7 # Data type is set to integer.
print(t)
t = 7+1
print(t)
t=7+0.7 # Data type is changed to float.
print(t)

7
8
7.7


## String Methods
Strings provide methods that perform a variety of useful operations. A method is similar to a function—it takes arguments and returns a value—but the syntax is different.

In [16]:
r = 'basalt'
print(r.capitalize()) #inovking capitalize on basalt
print(r.upper()) # involing upper on basalt
print(r.rjust(7)) # invoking right-justifying a string by padding with spaces
print(r.center(7)) # invoking centering a string by padding with spaces
print(' is an extrusive igneous rock.'.strip()) # stripping leading and trailing whitespace

Basalt
BASALT
 basalt
 basalt
is an extrusive igneous rock.


In [17]:
s = 'sandstone'
print(s.center(11))

 sandstone 


## String Operations
* The + operator performs string concatenation: joining the strings by linking them end-to-end.
* The * operator performs repetition.


In [18]:
first = 'black'
second = 'cat'
first+second

'blackcat'

In [19]:
print('x'*4)

xxxx


## Using Python to Automate Tedious Tasks
* It's important to use correct data types in data analysis.
* Pandas is a data analysis tool built on top of Python.
* In Pandas, an object is a string so if performs a string operation instead of a mathematical one.
* The astype() function can be used to force an appropriate dtype.

In [20]:
import pandas as pd
import numpy as np

In [21]:
from google.colab import files
uploaded = files.upload()

Saving meteorite-landings.csv to meteorite-landings.csv


In [22]:
df = pd.read_csv('meteorite-landings.csv')

In [23]:
df

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.77500,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.00000,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.90000,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95000,"(-33.166670, -64.950000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.0,Found,1990.0,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.0,Found,1999.0,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.3,Found,1939.0,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.0,Found,2003.0,49.78917,41.50460,"(49.789170, 41.504600)"


In [24]:
df.dtypes

name            object
id               int64
nametype        object
recclass        object
mass           float64
fall            object
year           float64
reclat         float64
reclong        float64
GeoLocation     object
dtype: object

In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45716 entries, 0 to 45715
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         45716 non-null  object 
 1   id           45716 non-null  int64  
 2   nametype     45716 non-null  object 
 3   recclass     45716 non-null  object 
 4   mass         45585 non-null  float64
 5   fall         45716 non-null  object 
 6   year         45428 non-null  float64
 7   reclat       38401 non-null  float64
 8   reclong      38401 non-null  float64
 9   GeoLocation  38401 non-null  object 
dtypes: float64(4), int64(1), object(5)
memory usage: 3.5+ MB


In [26]:
df_drop = df.dropna()
df_drop

Unnamed: 0,name,id,nametype,recclass,mass,fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.77500,6.08333,"(50.775000, 6.083330)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.183330, 10.233330)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.00000,"(54.216670, -113.000000)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.90000,"(16.883330, -99.900000)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95000,"(-33.166670, -64.950000)"
...,...,...,...,...,...,...,...,...,...,...
45711,Zillah 002,31356,Valid,Eucrite,172.0,Found,1990.0,29.03700,17.01850,"(29.037000, 17.018500)"
45712,Zinder,30409,Valid,"Pallasite, ungrouped",46.0,Found,1999.0,13.78333,8.96667,"(13.783330, 8.966670)"
45713,Zlin,30410,Valid,H4,3.3,Found,1939.0,49.25000,17.66667,"(49.250000, 17.666670)"
45714,Zubkovsky,31357,Valid,L6,2167.0,Found,2003.0,49.78917,41.50460,"(49.789170, 41.504600)"


In [27]:
df_drop.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38116 entries, 0 to 45715
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         38116 non-null  object 
 1   id           38116 non-null  int64  
 2   nametype     38116 non-null  object 
 3   recclass     38116 non-null  object 
 4   mass         38116 non-null  float64
 5   fall         38116 non-null  object 
 6   year         38116 non-null  float64
 7   reclat       38116 non-null  float64
 8   reclong      38116 non-null  float64
 9   GeoLocation  38116 non-null  object 
dtypes: float64(4), int64(1), object(5)
memory usage: 3.2+ MB


In [28]:
df_drop['year'].astype('int')

0        1880
1        1951
2        1952
3        1976
4        1902
         ... 
45711    1990
45712    1999
45713    1939
45714    2003
45715    1976
Name: year, Length: 38116, dtype: int64

In [29]:
df_drop['year'] = df_drop['year'].astype('int')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [30]:
df_drop.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38116 entries, 0 to 45715
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   name         38116 non-null  object 
 1   id           38116 non-null  int64  
 2   nametype     38116 non-null  object 
 3   recclass     38116 non-null  object 
 4   mass         38116 non-null  float64
 5   fall         38116 non-null  object 
 6   year         38116 non-null  int64  
 7   reclat       38116 non-null  float64
 8   reclong      38116 non-null  float64
 9   GeoLocation  38116 non-null  object 
dtypes: float64(3), int64(2), object(5)
memory usage: 3.2+ MB
