# List Comprehensions and Lamda Functions

In [1]:
# Load data, and have a look
import json

hn = json.load(open("hn_2014.json"))
print(json.dumps(hn[0], sort_keys=True, indent=2))

{
  "author": "dragongraphics",
  "createdAt": "2014-05-29T08:07:50Z",
  "createdAtI": 1401350870,
  "numComments": 0,
  "objectId": "7815238",
  "points": 2,
  "storyText": "",
  "tags": [
    "story",
    "author_dragongraphics",
    "story_7815238"
  ],
  "title": "Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability",
  "url": "http://ashleynolan.co.uk/blog/are-we-getting-too-sassy"
}


In [2]:
# Remove "createdAtI" key
hn_clean = []
for story in hn:
    story_copy = story.copy()     # don't modify the original
    del story_copy["createdAtI"] # remove unwanted key
    hn_clean.append(story_copy)
    
hn_clean[0]

{'author': 'dragongraphics',
 'numComments': 0,
 'points': 2,
 'url': 'http://ashleynolan.co.uk/blog/are-we-getting-too-sassy',
 'storyText': '',
 'createdAt': '2014-05-29T08:07:50Z',
 'tags': ['story', 'author_dragongraphics', 'story_7815238'],
 'title': 'Are we getting too Sassy? Weighing up micro-optimisation vs. maintainability',
 'objectId': '7815238'}

In [3]:
# Convert clean JSON to a dataframe
import pandas as pd
hn_df = pd.DataFrame(hn_clean)
hn_df

Unnamed: 0,author,numComments,points,url,storyText,createdAt,tags,title,objectId
0,dragongraphics,0,2,http://ashleynolan.co.uk/blog/are-we-getting-t...,,2014-05-29T08:07:50Z,"[story, author_dragongraphics, story_7815238]",Are we getting too Sassy? Weighing up micro-op...,7815238
1,jcr,0,1,http://spectrum.ieee.org/automaton/robotics/ho...,,2014-05-29T08:05:58Z,"[story, author_jcr, story_7815234]",Telemba Turns Your Old Roomba and Tablet Into ...,7815234
2,callum85,0,1,http://online.wsj.com/articles/apple-to-buy-be...,,2014-05-29T08:05:06Z,"[story, author_callum85, story_7815230]",Apple Agrees to Buy Beats for $3 Billion,7815230
3,d3v3r0,0,1,http://alexsblog.org/2014/05/29/dont-wait-for-...,,2014-05-29T08:00:08Z,"[story, author_d3v3r0, story_7815222]",Don’t wait for inspiration,7815222
4,timmipetit,0,1,http://techcrunch.com/2014/05/28/hackerone-get...,,2014-05-29T07:46:19Z,"[story, author_timmipetit, story_7815191]",HackerOne Get $9M In Series A Funding To Build...,7815191
...,...,...,...,...,...,...,...,...,...
35801,lispython,0,3,https://medium.com/p/ff5f4c9b16bd,,2014-01-01T00:33:42Z,"[story, author_lispython, story_6993601]",Engelbart and Kay,6993601
35802,co_pl_te,0,3,http://allthingsd.com/20131231/you-say-goodbye...,,2014-01-01T00:19:47Z,"[story, author_co_pl_te, story_6993568]",You Say Goodbye and We Say Hello,6993568
35803,maurorm,0,1,http://ghiraldelli.pro.br/jesus-e-eu/,,2014-01-01T00:11:06Z,"[story, author_maurorm, story_6993544]",Jesus e eu,6993544
35804,yeukhon,0,1,,https:&#x2F;&#x2F;fundraising.mozilla.org&#x2F;,2014-01-01T00:06:59Z,"[story, author_yeukhon, story_6993536]",Mozilla end-of-year fundraising jumps from $75...,6993536


In [4]:
# Investigate tags column

# check type(s) - all should be lists
print("tags value types: ")
print(hn_df["tags"].apply(type).value_counts())

print("-----------------------")

# check list length 
print("tags list lengths:")
print(hn_df["tags"].apply(len).value_counts())

tags value types: 
<class 'list'>    35806
Name: tags, dtype: int64
-----------------------
tags list lengths:
3    33459
4     2347
Name: tags, dtype: int64


In [5]:
# Look at tags items with length of 4
mask = hn_df["tags"].apply(len) == 4
hn_df[mask]["tags"]

43       [story, author_alamgir_mand, story_7813869, sh...
86         [story, author_cweagans, story_7812404, ask_hn]
104      [story, author_nightstrike789, story_7812099, ...
107      [story, author_ISeemToBeAVerb, story_7812048, ...
109         [story, author_Swizec, story_7812018, show_hn]
                               ...                        
35747      [story, author_rpm4321, story_6994970, show_hn]
35759            [story, author_ct, story_6994828, ask_hn]
35778    [story, author_ChrisNorstrom, story_6994370, a...
35787    [story, author_benjamincburns, story_6994163, ...
35792      [story, author_randall, story_6993981, show_hn]
Name: tags, Length: 2347, dtype: object

In [7]:
# extract last tag if len == 4
hn_df["tags"] = hn_df["tags"].apply(lambda l: l[:3] if len(l) == 4 else l)
hn_df["tags"].apply(len).value_counts()

3    35806
Name: tags, dtype: int64

# List Comprehensions and Lambda Functions


## Syntax

---

### WORKING WITH JSON FILES

- Open a JSON data set from a file to Python objects:

```python
f = open('filename.json')
json.load(f)
```

- Convert JSON data from a string to Python objects:

```python
json.loads(json_string)
```

- Convert JSON data stored in Python objects to string form:

```python
json.dumps(json_obj)
```

---

### LIST COMPREHENSIONS

#### Converting a for loop to a list comprehension

- Using a for loop:

```python
letters=['a', 'b', 'c', 'd']
caps=[]
for l in letters:
  caps.append(l.upper())
```
    
- Using a List comprehension:

```python
caps = [l.upper() for l in letters]
```

#### Common list comprehension patterns

- Transforming a list:

```python
ints = [25, 14, 13, 84, 43, 6, 77, 56]
doubled_ints = [i * 2 for i in ints]
```

- Creating test data:

```python
tenths = [i/10 for i in range(5)]
```

- Reducing a list:

```python
big_ints = [i for i in ints if i >= 50]
```

---

### LAMBDA FUNCTIONS

#### Converting a definition to a lambda function

- Defining a function:

```python
def double(x):
 return x * 2
```

- Defining a lambda function:

```python
run_function(function=lambda x: x * 2)
```

---

### THE TERNARY OPERATOR

- Create a one-line version of an if/else statement:

```python
"val_1 is bigger" if val_1 > val_2 else "val_1 is not bigger"
```

## Concepts

- JSON is a language independent format for storying structured data.
   - In Python, it can be represented by a series of nested lists, dictionaries, strings, and numeric objects.
- A list comprehension provides a concise way of creating lists using a single line of code, where:
   - You start with an iterable object
   - Optionally Transform the items in the iterable object
   - Optionally reduce the items in the iterable object using an if statement
   - Create a new list 
- Lambda functions can be defined in a single line, which lets you define a function at the time you need it.
- The ternary operator can be used to replace an if/else statement with a single line

## Resources

- [Official JSON specification](https://www.json.org/)
- [Python Documentation: JSON Module](https://docs.python.org/3.7/library/json.html#module-json)
- [Python Documentation: List Comprehensions](https://docs.python.org/3/tutorial/datastructures.html#tut-listcomps)
- [Python Documentation: Lambda Functions](https://docs.python.org/3/tutorial/controlflow.html#tut-lambda)
