# Creating a Simple Version of the Pandas Library
In this project, we will be creating a baby version of the Pandas class Wes McKinney created. The questions we are trying to answer are:  
 - Which song on Spotify had the higest number of plays in one day?
 - Which song on Spotify had the lowest number of plays in one day?

### Requirements of BabyPandas Class
BabyPandas (**Class**) should simplify the process of loading, previewing, manipulating, and making calculations (**Methods**) with our data (**Attribute**).  
  
Preview the data:  
 - View first five rows
 - View the shape of our data
 - View the data types for each column
 
Manipulate the data:  
 - Add new columns
 - Apply values to the columns
 - Subset the data
 - Change data type 
 
Make calculations:  
 - Find maximum
 - Find minimum
 - Find mean
 - Fnd standard deviation

##### Items Needed:
 - BabyPandas: **Class**
 - Load: **Method**
 - Filename: **Attribute**
 - Data: **Attribute**
 - Columns: **Attribute**
  
  
 - View first five rows: **Method**
 - View the shape of our data: **Method**
 - View the data types for each column: **Method**
   
   
 - Add new columns: **Method**
 - Apply values to the columns: **Method**
 - Subset the data: **Method**
 - Change data type: **Method**
  
  
 - Maximum: **Method**
 - Minimum: **Method**
 - Mean: **Method**
 - Standard deviation: **Method**

### Creating The Class

In [1]:
import csv
from statistics import mean, median, mode, stdev

class BabyPandas():
    def __init__(self, filename):
        self.filename = filename
    
    def read_data(self):
        f = open(self.filename, 'r')
        csvreader = csv.reader(f)
        self.data = list(csvreader)
        self.columns = len(self.data[0])
        
    def head(self):
        return self.data[:5]
    
    def info(self):
        headers = list(self.data[0])
        types = []
        for value in self.data[1]:
            types.append(type(value))
        col_type = {}
        for i in range(len(headers)):
            col_type[headers[i]] = types[i]
        return col_type
    
    def shape(self):
        return (len(self.data[1:]), self.columns)
    
    def new_column(self, col_name):
        self.data.append(col_name)
        
    def apply(self, column_name, new_value):
        for pos, col in enumerate(self.data[0]):
            if col == column_name:
                column_index = pos
        
        for data in self.data[1:]:
            data[column_index] = new_value
            
    def change_type(self, column_name, function):
        for pos, col in enumerate(self.data[0]):
            if col == column_name:
                column_index = pos
        
        for data in self.data[1:]:
            data[column_index] = function(data[column_index])
            
    def subset(self, column_name, row_value):
        for pos, col in enumerate(self.data[0]):
            if col == column_name:
                column_index = pos
        
        subset_data = []
        for data in self.data[1:]:
            if row_value in data:
                subset_data.append(data[column_index])
        return subset_data

    
    def summary_stats(self, column_name):
        for pos, col in enumerate(self.data[0]):
            if col == column_name:
                column_index = pos

        num_data = list(data[column_index])
        mean = mean(num_data)
        std = stdev(num_data)
        median = median(num_data)
        
        print("Mean is {mean}".format(mean= mean))
        print("Standard Deviation is {std}".format(std= std))
        print("Median is {median}".format(median= median))
        
            
    def minimum(self, column):
        for pos, col in enumerate(self.data[0]):
            if col == column:
                column_index = pos

        ## Find min value
        col_data = []
        for row in self.data[1:]:
            col_data.append([row[1],row[2],row[column_index]])
        
        return min(col_data, key= lambda x: x[2])
    
    def maximum(self, column):
        for pos, col in enumerate(self.data[0]):
            if col == column:
                column_index = pos
        ## Find min value
        col_data = []
        for row in self.data[1:]:
            col_data.append([row[1],row[2],row[column_index]])
        return max(col_data, key= lambda x: x[2])


In [5]:
s = BabyPandas("music_data.csv")
s.read_data()

print(s.info())
print(s.shape())
print(s.columns)

s.change_type('Streams',int)
print(s.maximum("Streams"))
print(s.minimum("Streams"))


{'': <class 'str'>, 'Streams': <class 'str'>, 'Track Name': <class 'str'>, 'Date': <class 'str'>, 'Region': <class 'str'>, 'Artist': <class 'str'>}
(37100, 6)
6
['Despacito (Featuring Daddy Yankee)', 'Luis Fonsi', 64238]
['Por Fin Te Encontré', 'Cali Y El Dandee', 1993]


 - The song with the highest number of plays was Despacitio
 - The song with the lowest number of plays was Por Fin Te Encontre

### Next Steps
 - Could be interesting to subset the data by year or month and determine most popular song and artist by year and month