#Chapter 4: Real-World Data Representation Using Tensors
*   Representing different types of real-world data as PyTorch tensors
*   Working with range of data types, including spread sheet, time series, text, image, and medical imaging
*   Loading data from file
*   Converting data to tensors
*   Shaping tensors so they can be used as inputs for neural network models

This chapter will covers different types of data and how to get them represented as tensors. Also learn how to load the data from the most common non-disk formats and get a feeling for those data types structure so you can see how to prepare them for training a neural network.


### **(0) Data Preparation**


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
#Importing the Data from Github Repo
url="https://raw.githubusercontent.com/deep-learning-with-pytorch/dlwpt-code/master/data/p1ch4/bike-sharing-dataset/hour-fixed.csv"
df=pd.read_csv(url)
df.head(10)

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1
5,6,2011-01-01,1,0,1,5,0,6,0,2,0.24,0.2576,0.75,0.0896,0,1,1
6,7,2011-01-01,1,0,1,6,0,6,0,1,0.22,0.2727,0.8,0.0,2,0,2
7,8,2011-01-01,1,0,1,7,0,6,0,1,0.2,0.2576,0.86,0.0,1,2,3
8,9,2011-01-01,1,0,1,8,0,6,0,1,0.24,0.2879,0.75,0.0,1,7,8
9,10,2011-01-01,1,0,1,9,0,6,0,1,0.32,0.3485,0.76,0.0,8,6,14


In [3]:
df.shape

(17520, 17)

In [4]:
df.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0,17520.0
mean,8652.180023,2.494521,0.5,6.515068,11.5,0.028767,2.999258,0.683562,1.429224,0.495291,0.474215,0.628025,0.190663,35.395205,152.620205,187.938299
std,5036.025134,1.109442,0.500014,3.449604,6.922384,0.167156,2.004034,0.465099,0.642868,0.193208,0.172492,0.193071,0.122946,49.205358,151.30484,181.447622
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
25%,4286.75,2.0,0.0,4.0,5.75,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,32.0,38.0
50%,8645.5,2.5,0.5,7.0,11.5,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,16.0,114.0,140.0
75%,13015.25,3.0,1.0,10.0,17.25,0.0,5.0,1.0,2.0,0.66,0.6212,0.79,0.2537,48.0,219.0,280.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17520 entries, 0 to 17519
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17520 non-null  int64  
 1   dteday      17520 non-null  object 
 2   season      17520 non-null  int64  
 3   yr          17520 non-null  int64  
 4   mnth        17520 non-null  int64  
 5   hr          17520 non-null  int64  
 6   holiday     17520 non-null  int64  
 7   weekday     17520 non-null  int64  
 8   workingday  17520 non-null  int64  
 9   weathersit  17520 non-null  int64  
 10  temp        17520 non-null  float64
 11  atemp       17520 non-null  float64
 12  hum         17520 non-null  float64
 13  windspeed   17520 non-null  float64
 14  casual      17520 non-null  int64  
 15  registered  17520 non-null  int64  
 16  cnt         17520 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


In [6]:
import torch
torch.set_printoptions(edgeitems=2,threshold=50)

In [7]:
bike_numpy=np.loadtxt(url,
                      dtype=np.float32,
                      delimiter=",",
                      skiprows=1,
                      converters={1: lambda x: float(x[8:10])})
#Converting the date strings to numbers corresponding to the day of the month in column 1
bikes=torch.from_numpy(bike_numpy)
bikes

tensor([[1.0000e+00, 1.0000e+00,  ..., 1.3000e+01, 1.6000e+01],
        [2.0000e+00, 1.0000e+00,  ..., 3.2000e+01, 4.0000e+01],
        ...,
        [1.7378e+04, 3.1000e+01,  ..., 4.8000e+01, 6.1000e+01],
        [1.7379e+04, 3.1000e+01,  ..., 3.7000e+01, 4.9000e+01]])

In [8]:
bikes.shape,bikes.stride()

(torch.Size([17520, 17]), (17, 1))

For every hour, the dataset reports the following variables:

1.   *instant*: index of record
2.   *day*: day of the month
3.   *season*: (1-spring, 2-summer, 3-fall, 4-winter)
4.   *yr*: (0-2011, 1-2012)
5.   *mnth*: (1 to 12)
6.   *hr*: (0 to 23)
7.   *holiday*: holiday status
8.   *weekday*: day of the week
9.   *workingday*: working day status
10.  *weatherlist*: (1-clear, 2-mist, 3-light rain/snow, 4-heavy rain/snow)
11.  *temp*: temperature in degree C
12.  *atemp*: perceived temperature in C
13.  *hum*: humidity
14.  *windspeed*: the windspeed
15.  *causal*: number of causal users
16.  *registered*: number of registered users
17.  *cnt*: count of rental bikes



In [9]:
bikes.shape,bikes.stride()

(torch.Size([17520, 17]), (17, 1))

In [10]:
daily_bikes=bikes.view(-1,24,bikes.shape[1])
daily_bikes.shape,daily_bikes.stride()

(torch.Size([730, 24, 17]), (408, 17, 1))