In [None]:
!wget https://calmcode.io/static/data/pokemon.json

--2024-04-15 18:36:14--  https://calmcode.io/static/data/pokemon.json
Resolving calmcode.io (calmcode.io)... 172.66.0.96, 162.159.140.98, 2606:4700:7::60, ...
Connecting to calmcode.io (calmcode.io)|172.66.0.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 59991 (59K) [application/json]
Saving to: ‘pokemon.json’


2024-04-15 18:36:15 (141 MB/s) - ‘pokemon.json’ saved [59991/59991]



The dataset was acquired from the internet and submitted to the Google Colab runtime.



In [None]:
import json
import pathlib

file_path = 'pokemon.json'

with open(file_path, 'r') as file:
    poke_dict = json.load(file)


In [None]:
poke_dict[:4]

[{'name': 'Bulbasaur',
  'type': ['Grass', 'Poison'],
  'total': 318,
  'hp': 45,
  'attack': 49},
 {'name': 'Ivysaur',
  'type': ['Grass', 'Poison'],
  'total': 405,
  'hp': 60,
  'attack': 62},
 {'name': 'Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 525,
  'hp': 80,
  'attack': 82},
 {'name': 'VenusaurMega Venusaur',
  'type': ['Grass', 'Poison'],
  'total': 625,
  'hp': 80,
  'attack': 100}]

The dataset is not in a convenient format as of right now; it is loaded as a list of dictionaries. Numerous properties are present in it, including name, type, total stats, HP, and attack.

Let's employ a new, difficult topic called "Method Chains" instead of the ineffective pandas module to work with this kind of data. Object-oriented and functional programming are combined in this topic. Working with these kinds of datasets allows for very efficient procedures.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, func):
        return [d for d in self.blob if func(d)]

In [None]:
Clumper(poke_dict).keep(lambda d: 'Dragon' in d['type'])

[{'name': 'CharizardMega Charizard X',
  'type': ['Fire', 'Dragon'],
  'total': 634,
  'hp': 78,
  'attack': 130},
 {'name': 'Dratini', 'type': ['Dragon'], 'total': 300, 'hp': 41, 'attack': 64},
 {'name': 'Dragonair',
  'type': ['Dragon'],
  'total': 420,
  'hp': 61,
  'attack': 84},
 {'name': 'Dragonite',
  'type': ['Dragon', 'Flying'],
  'total': 600,
  'hp': 91,
  'attack': 134},
 {'name': 'AmpharosMega Ampharos',
  'type': ['Electric', 'Dragon'],
  'total': 610,
  'hp': 90,
  'attack': 95},
 {'name': 'Kingdra',
  'type': ['Water', 'Dragon'],
  'total': 540,
  'hp': 75,
  'attack': 95},
 {'name': 'SceptileMega Sceptile',
  'type': ['Grass', 'Dragon'],
  'total': 630,
  'hp': 70,
  'attack': 110},
 {'name': 'Vibrava',
  'type': ['Ground', 'Dragon'],
  'total': 340,
  'hp': 50,
  'attack': 70},
 {'name': 'Flygon',
  'type': ['Ground', 'Dragon'],
  'total': 520,
  'hp': 80,
  'attack': 100},
 {'name': 'Altaria',
  'type': ['Dragon', 'Flying'],
  'total': 490,
  'hp': 75,
  'attack': 70

The flexibility of this Clumper class allows us to evaluate and filter the data in accordance with our needs or preferences. The dataset will be used as input, and the desired list will be returned as the result.

Suppose we wish to extract only the "dragon" type pokemon from the dataset in order to deal with them exclusively. With the aforementioned code, we can accomplish it.

Without even writing the function, we are applying the condition using the topic of "Lambda" functions.

There you have it, a list of every Pokémon of the dragon type. While everything appears to be in order, there is one problem.

The lists are returned as output by the retain method, so we are unable to apply additional filtering to the list because data is a list and the maintain method cannot be used to it. Now let's try changing the code to retrieve the output as a Clumper object, which will allow us to apply as many keep methods as possible on top of it to obtain precise data that has undergone numerous filters.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, func):
        return Clumper([d for d in self.blob if func(d)])

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'])
  .keep(lambda d: d['hp'] > 100)
  .blob)

[{'name': 'Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 680,
  'hp': 105,
  'attack': 150},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'Zygarde50% Forme',
  '

Compared to the prior code, we made two minor changes here. Rather than providing a list, we return a Clumper object, and we apply methods with conditions in a chain to that object repeatedly.

Since the preceding code obtained all of the "Dragon" pokemon as a list of dictionaries, we now wish to apply a second condition to further filter the result list. Assume for the moment that I want every dragon Pokemon with an HP of at least 100.

Once we applied the two requirements—type must be "dragon" and hp must be greater than "100"—the aforementioned code provided us with the list.

Thus, we can apply several conditions in a chain using this clumper object to provide sophisticated filtering.

Although the aforementioned code met our needs, it would still be ideal to apply two criteria on a single line to save time and enhance accuracy when working with large programs that require several conditions.


There's a method for it. The code can be changed to become:


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .blob)

[{'name': 'Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 680,
  'hp': 105,
  'attack': 150},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'Zygarde50% Forme',
  '

Here, we included a '*' before the function name to indicate that we might pass more than one function, and we passed functions in such a way that more than one function could be passed.

The list is then updated with the outcomes of the lambda function iterated repeatedly whenever we see fresh data, 'd'.

Thus, the method uses the same line of code to iterate twice. The first filtering will be applied initially, followed by a second filtering. Similar to the previous code, we can apply both criteria and perform several filterations in this manner, and we can do so in less lines as well.




Instead of obtaining the data from the top or bottom with filters, we are obtaining it from the initial source and that too in the dataset order.

The head and tails methods can be used for that.
Let's look at the actual example.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])


In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .head(10)
  .tail(10)
  .blob)

[{'name': 'Zygarde50% Forme',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 100},
 {'name': 'KyuremWhite Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 120},
 {'name': 'KyuremBlack Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 700,
  'hp': 125,
  'attack': 170},
 {'name': 'Kyurem',
  'type': ['Dragon', 'Ice'],
  'total': 660,
  'hp': 125,
  'attack': 130},
 {'name': 'GiratinaOrigin Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 120},
 {'name': 'GiratinaAltered Forme',
  'type': ['Ghost', 'Dragon'],
  'total': 680,
  'hp': 150,
  'attack': 100},
 {'name': 'GarchompMega Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 700,
  'hp': 108,
  'attack': 170},
 {'name': 'Garchomp',
  'type': ['Dragon', 'Ground'],
  'total': 600,
  'hp': 108,
  'attack': 130},
 {'name': 'RayquazaMega Rayquaza',
  'type': ['Dragon', 'Flying'],
  'total': 780,
  'hp': 105,
  'attack': 180},
 {'name': 'Rayquaza',
  '

Using the head method, the above code yielded the top 10 items from the list, and the tail function yielded the bottom 10 items.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .select('name', 'hp')
  .head(10)
  .blob)

[{'name': 'Rayquaza', 'hp': 105},
 {'name': 'RayquazaMega Rayquaza', 'hp': 105},
 {'name': 'Garchomp', 'hp': 108},
 {'name': 'GarchompMega Garchomp', 'hp': 108},
 {'name': 'GiratinaAltered Forme', 'hp': 150},
 {'name': 'GiratinaOrigin Forme', 'hp': 150},
 {'name': 'Kyurem', 'hp': 125},
 {'name': 'KyuremBlack Kyurem', 'hp': 125},
 {'name': 'KyuremWhite Kyurem', 'hp': 125},
 {'name': 'Zygarde50% Forme', 'hp': 108}]

In addition to altering the elements' order, we may use the select method to obtain the columns we need.

By indicating the keys we desire and adhering to the same principle as before, iterating repeatedly as we see the circumstances, it operates by utilizing dictionary comprehension.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
      data = self.blob
      for key, func in kwargs.items():
          for i in range(len(data)):
              data[i][key] = func(data[i])
      return Clumper(data)

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Dragon' in d['type'],
        lambda d: d['hp'] > 100)
  .select('name', 'hp')
  .mutate(hp = lambda d: d['hp'] * 2,
          hp4 = lambda d: d['hp'] * 4)
  .blob)

[{'name': 'Rayquaza', 'hp': 210, 'hp4': 840},
 {'name': 'RayquazaMega Rayquaza', 'hp': 210, 'hp4': 840},
 {'name': 'Garchomp', 'hp': 216, 'hp4': 864},
 {'name': 'GarchompMega Garchomp', 'hp': 216, 'hp4': 864},
 {'name': 'GiratinaAltered Forme', 'hp': 300, 'hp4': 1200},
 {'name': 'GiratinaOrigin Forme', 'hp': 300, 'hp4': 1200},
 {'name': 'Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'KyuremBlack Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'KyuremWhite Kyurem', 'hp': 250, 'hp4': 1000},
 {'name': 'Zygarde50% Forme', 'hp': 216, 'hp4': 864}]

We are allowing the user to add new columns to the dataset or override already-existing ones in the code demonstration above.

Without affecting functionality, we may even have functions tell the user how things are done in a methodical manner.

Now, for each key and function in the items, we are taking data, passing it to the function, and storing it in the data segment of the keys. The name and HP are passed to the method as kwargs arguments. then the Clumper object will be returned. This is how the mutate method's code looks.

New HP columns are being added, with the value of each column being doubled and quadrupled in HP4.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range(len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)

    def sort(self, key, reverse=False):
        return Clumper(sorted(self.blob, key=key, reverse=reverse))


In [None]:
(Clumper(poke_dict)
    .keep(lambda d: 'Dragon' in d['type'],
          lambda d: d['hp'] > 100)
    .select('name', 'hp')
    .sort(lambda d: d['hp'], reverse=True)
    .blob)


[{'name': 'GiratinaAltered Forme', 'hp': 150},
 {'name': 'GiratinaOrigin Forme', 'hp': 150},
 {'name': 'Kyurem', 'hp': 125},
 {'name': 'KyuremBlack Kyurem', 'hp': 125},
 {'name': 'KyuremWhite Kyurem', 'hp': 125},
 {'name': 'Garchomp', 'hp': 108},
 {'name': 'GarchompMega Garchomp', 'hp': 108},
 {'name': 'Zygarde50% Forme', 'hp': 108},
 {'name': 'Rayquaza', 'hp': 105},
 {'name': 'RayquazaMega Rayquaza', 'hp': 105}]

As we can see by now, the object-oriented technique and the function approach were coupled to produce the intended results.

Even the tuples can now be sorted. We can now sort the initial members of the tuple since we have generic functions in Python and have implemented sorting into the code.

We must apply the idea of lambda functions if we wish to carry out the same sorting on the second members of the tuple. After assigning a value to each key in the tuple, we pass the lambda function to them and apply the logic.

Like before, we can use functions to define what we do and how we do it.


In [None]:
class Clumper:
    def __init__(self, blob):
        self.blob = blob

    def keep(self, *funcs):
        data = self.blob
        for func in funcs:
            data = [d for d in data if func(d)]
        return Clumper(data)

    def head(self, n):
        return Clumper([self.blob[i] for i in range(n)])

    def tail(self, n):
        return Clumper([self.blob[-i] for i in range(1, n+1)])

    def select(self, *keys):
        return Clumper([{k: d[k] for k in keys} for d in self.blob])

    def mutate(self, **kwargs):
        data = self.blob
        for key, func in kwargs.items():
            for i in range(len(data)):
                data[i][key] = func(data[i])
        return Clumper(data)

    def sort(self, key, reverse=False):
        return Clumper(sorted(self.blob, key=key, reverse=reverse))

In [None]:
(Clumper(poke_dict)
  .keep(lambda d: 'Grass' in d['type'],
        lambda d: d['hp'] < 60)
  .mutate(ratio=lambda d: d['attack']/d['hp'])
  .select('name', 'ratio')
  .sort(lambda d: d['ratio'], reverse=True)
  .head(15)
  .blob)

It's easy to make changes to the analysis here because the reading is from left to right top to bottom. We can change the order of lines which will change the order of the code which makes it easy to reason about the steps that are being applied to our data.

Model Chains API is much more powerful and flexible than traditional pandas module to even perform any changes with specifying the changes with the help of functions. We can even improve speed further by workinh on this module. It can even perfrom better and exceed limitations of pandas and python.