In [1]:
import pandas as pd

# NUMPY CAPSTONE PROJECT - BLOOD DONATION
![blood_donation.png](blood_donation.png)
<p>Blood transfusion saves lives - from replacing lost blood during major surgery or a serious injury to treating various illnesses and blood disorders. Ensuring that there's enough blood in supply whenever needed is a serious challenge for the health professionals. According to <a href="https://www.webmd.com/a-to-z-guides/blood-transfusion-what-to-know#1">WebMD</a>, "about 5 million Americans need a blood transfusion every year".</p>
<p>Our dataset is from a mobile blood donation vehicle in Taiwan.</p>
<p>The data is stored in <code>datasets/transfusion.data</code> and it is structured according to RFMTC marketing model (a variation of RFM). 
<p>In this project, you are going to inspect the data using Numpy.</p>

#### IMPORTING LIBRARIES AND DATA

* Import `numpy` as np and genfromtxt as follows: `from numpy import genfromtxt`

* Call the data by using gentxt as follows: `gentxt("YourDirectory", delimiter = ","`

In [7]:
import numpy as np
from numpy import genfromtxt
my_data = genfromtxt("C:\\Users\\sam s\\Desktop\\techproed\\data science\\Capstone\\datasets\\transfusion.data",
                    delimiter =",")                   

* Inspect our data's type by `my_data`

In [8]:
type(my_data)

numpy.ndarray

* Use `ndim` to see how many dimensions data has.

In [9]:
my_data.ndim  # kac boyutlu : 2

2

* Return the first row our data.

In [10]:
my_data[0]  # ilksatir ve Nan deger

array([nan, nan, nan, nan, nan])

* First row contains `nan` values. Delete `nan` values by `np.delete()`
* Note: `nan` values are located in `0,0`

In [11]:
my_data = np.delete[my_data,0,0] # boylece nan degerler atilmis sekilde devam edebiliriz

TypeError: 'function' object is not subscriptable

* Return `my_data` to check whether you removed `nan` values or not.

In [12]:
my_data

array([[     nan,      nan,      nan,      nan,      nan],
       [2.00e+00, 5.00e+01, 1.25e+04, 9.80e+01, 1.00e+00],
       [0.00e+00, 1.30e+01, 3.25e+03, 2.80e+01, 1.00e+00],
       ...,
       [2.30e+01, 3.00e+00, 7.50e+02, 6.20e+01, 0.00e+00],
       [3.90e+01, 1.00e+00, 2.50e+02, 3.90e+01, 0.00e+00],
       [7.20e+01, 1.00e+00, 2.50e+02, 7.20e+01, 0.00e+00]])

* To see the dimensions of the data, use `shape`

In [13]:
my_data.shape

(749, 5)

* To see how many unit(eleman) you have on your data, use `size`

In [14]:
my_data.size

3745

* To see the data type inside `my_data`, use `dtype`

In [15]:
my_data.dtype

dtype('float64')

* To see the size of the each unit(eleman), use `itemsize`

In [16]:
my_data.itemsize

8

* Create a matrix that has 2 rows and 5 columns and contains 0 by `np.zeros`. Name it as `sifir`

In [24]:
sifir = np.zeros((2,5))
sifir

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

* Create a matrix that has 2 rows and 5 columns and contains 1 by `np.ones`. Name it as `bir`

In [18]:
bir = np.ones((2,5))
bir

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

* Create a matrix that has 2 rows and 5 columns and contains 38 by `np.full`. Name it as `otuzsekiz`

In [19]:
otuzsekiz = np.full((2,5), 38)

* Create an eye matrix that has 5 rows and 5 columns by `np.eye`. Name it as `eye`

In [20]:
eye= np.eye(5)  # eye= np.eye(5,5)  de oabilir
eye

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

* Create a matrix that has 2 rows and 5 columns and contains random values between 0 and 1 by `np.random.random`. Name it as `random`

In [21]:
random = np.random.random((2,5))  # random 0-1 arasi deger uretir
random   

array([[0.64704799, 0.94937763, 0.11742367, 0.9234556 , 0.06911688],
       [0.9072155 , 0.30668931, 0.2162135 , 0.68197535, 0.17500488]])

* Create a matrix that has 2 rows and 5 columns(use `reshape` for that) and contains values increases 1 at a time, and between 1 and 10 by `np.linspace`. Name it as `linsp`

In [22]:
linsp=np.linspace(1,10,10).reshape((2,5))
linsp

array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.]])

* Extract `linsp` with `np.sqrt` and name the result as `linsp`

In [23]:
linsp = np.sqrt(linsp)
linsp

array([[1.        , 1.41421356, 1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712, 3.        , 3.16227766]])

* exponentiate `random` and name the result as `random`

In [25]:
random = random**2
random

array([[0.4186711 , 0.90131788, 0.01378832, 0.85277024, 0.00477714],
       [0.82303997, 0.09405833, 0.04674828, 0.46509038, 0.03062671]])

* Sum `linsp` and `random` and name it as `toplam`

In [26]:
toplam = linsp + random
toplam

array([[1.4186711 , 2.31553145, 1.74583913, 2.85277024, 2.24084512],
       [3.27252971, 2.73980964, 2.8751754 , 3.46509038, 3.19290437]])

* Divide `bir` and `sifir` and name it as `bolme`
* If you receive and warning or error, briefly explain why

In [27]:
bolme1 = bir/sifir
bolme1  # sonsuz ve uyari verir

  bolme1 = bir/sifir


array([[inf, inf, inf, inf, inf],
       [inf, inf, inf, inf, inf]])

* Subtract `bir` and `sifir` and name it as `cikarma`

In [28]:
cikarma = bir - sifir
cikarma  # maris islemi oldugundan hepsi sifir oldu

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

* divide  `cikarma` and `toplam`. Then, name it as `bolme`

In [30]:
bolme = cikarma /toplam
bolme

array([[0.70488501, 0.4318663 , 0.57279046, 0.35053647, 0.4462602 ],
       [0.305574  , 0.36498886, 0.34780487, 0.28859276, 0.31319447]])

* Multiply `toplam` and `bolme` by element basis and name it as `ecarpma`

In [31]:
ecarpma = toplam * bolme
ecarpma

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

* Multiply `ecarpma` and `eye` by matrix basis and name it as `mcarpma`

In [34]:
mcarpma = ecarpma @ eye
mcarpma  # eleman bazli carpim *,  ancak matris bazli carpim @ le yapilir

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

* Create matrix `a` that has following values:

`[[ 1 2 3 4 5]
  [ 6 7 8 9 10]]`

In [36]:
# ilk degerin sutun degeri ile ikinci degerin satir degeri ayni olmalidir matris bazli carpim icin
a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
a

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

* Return the **boolean values** result of the values that are more than 3

In [37]:
a > 3

array([[False, False, False,  True,  True],
       [ True,  True,  True,  True,  True]])

* Return the values that are more than 3

In [38]:
a[a>3]

array([ 4,  5,  6,  7,  8,  9, 10])

* Set the values that are more than 3 to 0 and name the result as `a`

In [39]:
a[a>3]= 0
a

array([[1, 2, 3, 0, 0],
       [0, 0, 0, 0, 0]])

* Join `a` and `mcarpma ` by using stack functions(`axis=1`) and name it as `stc`

In [42]:
stc = np.stack((a, mcarpma), axis = 1) # sutun bazinda birlsetir, axis = 0 satir bazda birlsetir
stc

array([[[1., 2., 3., 0., 0.],
        [1., 1., 1., 1., 1.]],

       [[0., 0., 0., 0., 0.],
        [1., 1., 1., 1., 1.]]])

* Take the 1'st and 3'rd rows from `stc`, assign them to a new matrix.Name this new matrix as `guncel`

In [43]:
guncel = stc[:,0:1]
guncel

array([[[1., 2., 3., 0., 0.]],

       [[0., 0., 0., 0., 0.]]])

* Make guncel 2 dimensional array.

In [46]:
# 2 boyutluya 3 boyutludan donusturelim yukarida 3 boyutlu oldu
guncel = guncel.reshape(2,5) # iki boyutlu 5 sutunlu
guncel.ndim

2

In [51]:
# pandas numpy kullanilarak yazilmis bir kutuphane
# ornegin concatenate methodu her ikisinde de var

dataC = np.concatenate([my_data, guncel]) # joinliyecegimiz ilk ve ikinci degeri yazdik
dataC

array([[     nan,      nan,      nan,      nan,      nan],
       [2.00e+00, 5.00e+01, 1.25e+04, 9.80e+01, 1.00e+00],
       [0.00e+00, 1.30e+01, 3.25e+03, 2.80e+01, 1.00e+00],
       ...,
       [7.20e+01, 1.00e+00, 2.50e+02, 7.20e+01, 0.00e+00],
       [1.00e+00, 2.00e+00, 3.00e+00, 0.00e+00, 0.00e+00],
       [0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00]])

* Sum the columns of `data`

In [52]:
dataC.sum(axis=0)

array([nan, nan, nan, nan, nan])

* Sum the rows of `data`

In [53]:
dataC.sum(axis=1)

array([       nan, 1.2651e+04, 3.2920e+03, 4.0530e+03, 5.0680e+03,
       6.1020e+03, 1.0120e+03, 1.7740e+03, 3.0480e+03, 2.2840e+03,
       1.1650e+04, 5.8350e+03, 7.5700e+02, 2.5410e+03, 3.3110e+03,
       1.5240e+03, 1.2690e+03, 3.5650e+03, 3.8170e+03, 1.5240e+03,
       7.6000e+02, 7.6000e+02, 2.7930e+03, 1.5250e+03, 1.5250e+03,
       2.2840e+03, 3.5580e+03, 1.5240e+03, 3.0510e+03, 1.2710e+03,
       2.0330e+03, 3.5730e+03, 2.5430e+03, 2.5430e+03, 2.2900e+03,
       4.0820e+03, 2.0390e+03, 3.0620e+03, 1.5270e+03, 3.5740e+03,
       1.7840e+03, 3.3190e+03, 1.2730e+03, 1.2740e+03, 1.2730e+03,
       5.0940e+03, 2.2920e+03, 2.2970e+03, 5.0600e+02, 5.0600e+02,
       5.0600e+02, 2.8090e+03, 2.8100e+03, 1.5300e+03, 3.0660e+03,
       1.2740e+03, 4.8430e+03, 2.0390e+03, 1.7880e+03, 4.0990e+03,
       1.5300e+03, 1.7880e+03, 2.0460e+03, 2.5610e+03, 1.2760e+03,
       7.6500e+02, 4.0930e+03, 1.0210e+03, 5.0600e+02, 1.7860e+03,
       2.3110e+03, 1.0220e+03, 1.0220e+03, 4.3430e+03, 5.0800e

* Return the maximum values of each column

In [56]:
dataC.max(axis=0)

array([nan, nan, nan, nan, nan])

* Return the maximum values of each row

In [57]:
dataC.max(axis=1)

array([      nan, 1.250e+04, 3.250e+03, 4.000e+03, 5.000e+03, 6.000e+03,
       1.000e+03, 1.750e+03, 3.000e+03, 2.250e+03, 1.150e+04, 5.750e+03,
       7.500e+02, 2.500e+03, 3.250e+03, 1.500e+03, 1.250e+03, 3.500e+03,
       3.750e+03, 1.500e+03, 7.500e+02, 7.500e+02, 2.750e+03, 1.500e+03,
       1.500e+03, 2.250e+03, 3.500e+03, 1.500e+03, 3.000e+03, 1.250e+03,
       2.000e+03, 3.500e+03, 2.500e+03, 2.500e+03, 2.250e+03, 4.000e+03,
       2.000e+03, 3.000e+03, 1.500e+03, 3.500e+03, 1.750e+03, 3.250e+03,
       1.250e+03, 1.250e+03, 1.250e+03, 5.000e+03, 2.250e+03, 2.250e+03,
       5.000e+02, 5.000e+02, 5.000e+02, 2.750e+03, 2.750e+03, 1.500e+03,
       3.000e+03, 1.250e+03, 4.750e+03, 2.000e+03, 1.750e+03, 4.000e+03,
       1.500e+03, 1.750e+03, 2.000e+03, 2.500e+03, 1.250e+03, 7.500e+02,
       4.000e+03, 1.000e+03, 5.000e+02, 1.750e+03, 2.250e+03, 1.000e+03,
       1.000e+03, 4.250e+03, 5.000e+02, 5.000e+02, 5.000e+02, 1.000e+03,
       5.000e+02, 5.000e+02, 5.000e+02, 1.500e+03, 

* Return the minimum values of each column

In [58]:
dataC.min(axis=0)

array([nan, nan, nan, nan, nan])

* Return the minimum values of each row

In [59]:
dataC.min(axis=1)

array([nan,  1.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,
        1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  1.,  1.,  0.,
        0.,  0.,  1.,  1.,  0.,  0.,  1.,  1.,  1.,  0.,  1.,  1.,  1.,
        1.,  1.,  1.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,
        1.,  0.,  0.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  1.,
        1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.,  1.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,
        1.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.,
        1.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  1.,  0.,  1.,
        1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  1.,  0.,  0.,  1.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0

* Find the index of the biggest value
* Note: The value you're about to reach is the index of our `data`'s flatten value.

In [60]:
np.argmax(dataC) # 3432 sutunlu 1 satirli ornegin bir data getirip hangi index'de en buyuk degerin oldugunu osteriyor
np.argmax(dataC)

0

* Find the index of the smallest value

In [61]:
np.argmin(dataC)

0

* Transpose to `data` and set it the result as `datat`

In [63]:
# Transpose alamk satir ve sutun yerleri yer degistirilir. 2 satir
datat = dataC.T
datat

array([[     nan, 2.00e+00, 0.00e+00, ..., 7.20e+01, 1.00e+00, 0.00e+00],
       [     nan, 5.00e+01, 1.30e+01, ..., 1.00e+00, 2.00e+00, 0.00e+00],
       [     nan, 1.25e+04, 3.25e+03, ..., 2.50e+02, 3.00e+00, 0.00e+00],
       [     nan, 9.80e+01, 2.80e+01, ..., 7.20e+01, 0.00e+00, 0.00e+00],
       [     nan, 1.00e+00, 1.00e+00, ..., 0.00e+00, 0.00e+00, 0.00e+00]])