# Lambda School Data Science - Artificial General Intelligence and The Future

![Future City](https://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/City-of-the-future.jpg/640px-City-of-the-future.jpg)

# Lecture

## Defining Intelligence

A straightforward definition of Artificial Intelligence would simply be "intelligence, created from technology rather than biology." But that simply raises the question - what is *intelligence*?

In the early history of computers, this seemed like an easier question. Intelligence meant solving tricky problems - things that took time and mental effort for a human to figure out.

Defined that way, computers have made a litany of intelligent achievements over the years:
- Arithmetic
- Logic
- Chess
- Go
- StarCraft
- Mathematical proofs
- Understanding natural language
- Generating natural language
- Understanding images
- Generating images
- Making medical diagnoses
- Fitting and *optimizing* ML models

And many more - every time you fit a simple regression, you're facilitating an act of artificial intelligence. You're writing code that will (hopefully) understand and generalize based on data, giving a "human-like" ability to intuit and predict something.

## "General" Intelligence - a moving target

But, somehow, that isn't what most people *really* mean when they talk about AI.

![And they both react poorly to showers.](https://imgs.xkcd.com/comics/ai.png)

Somewhere that word "general" snuck in, and now we're concerned about "Artificial General Intelligence." So, what is that?

![Data](https://upload.wikimedia.org/wikipedia/en/0/09/DataTNG.jpg)

The inspiration is likely characters such as the above, but that's not a definition. Intuitively the claim is "computers that can be thrown in a variety of environments and learn without guidance", but another good definition (based on how people use the term) may simply be "whatever we haven't figured out how to get computers to do yet."

Repeatedly, claims are made about tasks that will require a "true AI" to achieve. Then, when those tasks are completed, the bar is moved, and "true AI" is somehow always a bit further off.

## AI - Hype versus Value

Hot off the presses! [Google launches an end-to-end AI platform](https://techcrunch.com/2019/04/10/google-expands-its-ai-services/)!

...

What does that mean? Well, it might mean a lot, but it's a little unclear what. Some selected [Hacker News](https://news.ycombinator.com/item?id=19626275) comments:

> This platform focuses not on the this-AI-is-magic-and-can-solve-everything like many AI SaaS startups announced on Hacker News, but focuses on how to actually integrate this AI into production workflows, which is something I wish was discussed more often in AI. -- minimaxir

> Looks like Google is taking over Cloud (from AWS) for AI by building an ecosystem and building tools for non Data scientists - consumer level product. Surely IBM can do similar thing with their recent Redhat acquisition, but will they ? -- amrrs

> I work in building and deploying production ML/AI models but I'm having a lot of trouble cutting through the marketing jargon in this article and on Google's website as well. Can someone explain what this does in engineering terms? How does this differ from something like AWS Sagemaker? -- chibg10

> This will make a bunch of startup's life really hard. I think it makes it harder to justify investing in your own ML pipeline or even building your own models for many use cases.

One thing it definitely means - AI is a hot keyword, and people making hiring and other corporate decisions will be on the look out for it, even if they're not sure what it is.

So - yes, you *do* know AI. AI is a real thing, and you are capable of using "artificial" technology to bring about real *intelligence* and insight.

Do you know how to make an intelligent anthropomorphic android? No - and nobody else does yet, either. And that's OK. There's still lots of cool advances and things to learn and build.

## Automation, for good and ill

It is worth spending a moment considering the double-edged sword that is automation. This story did not begin with artificial intelligence, or even statistics or mathematics - it began when the first tool inventor figured out how to make something clever like a lever or a wheel, and use it to reduce the amount of labor needed to achieve some task.

In the modern day we talk about automation, but in practice most technology is best considered as a *productivity multiplier* - all businesses still need at least *some* humans around, if nothing else to make policy decisions and collect profit. But the productivity of each individual person can be greatly enhanced through the use of technology.

Consider farming - formerly a signification source of employment (and also small family owned farms), technology has tranformed it into a large scale industry where a handful of people produce as much as many more did before. This progression has happened in many areas - fortunately, it is usually accompanied by job growth and opportunity as new markets and services are created by technology as well.

So, is it different now? Maybe - "history will say" is the only safe stance. But we are automating work at an accelerating rate, and it's unclear where all this growth is going and where the opportunities will be. There's a pretty good bet that it'll involve computers and data - and that's probably a large part of why you're here!

The purpose of this section is not to convince you of anything - it is just to make you think. As a Data Scientist, you will have an outsized impact on society, and it is your responsibility to consider that impact and what you want to do with it.

**Important caveat** - think and engage with society, *but* strive to not be strident or unduly certain when you do so. Broadcasting political beliefs, especially while on the job market, usually closes more doors than it opens. So, consider perspectives, and encourage dialogue - don't just (re)broadcast outrage at the latest injustice.

## AutoML - taking our own jobs

Us Data Scientists are not immune to automation. Behold, yet another voyage with the RMS Titanic ðŸš¢:

![AutoML applied to Titanic data](https://github.com/minimaxir/automl-gs/raw/master/docs/console-demo.gif)

### Using AutoML on some data you've probably seen before

Let's start with [automl-gs](https://github.com/minimaxir/automl-gs), a very new library that just works directly from csv.

In [1]:
!pip install automl_gs

Collecting automl_gs
  Downloading https://files.pythonhosted.org/packages/c4/51/27833a08fe4f83711b09836ddd9128e275a6900c47e0e5782112ed611484/automl_gs-0.2.1.tar.gz
Collecting autopep8 (from automl_gs)
[?25l  Downloading https://files.pythonhosted.org/packages/5b/ba/37d30e4263c51ee5a655118ac8c331e96a4e45fd4cea876a74b87af9ffc1/autopep8-1.4.3.tar.gz (113kB)
[K    100% |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 122kB 10.5MB/s 
Collecting pycodestyle>=2.4.0 (from autopep8->automl_gs)
[?25l  Downloading https://files.pythonhosted.org/packages/0e/0c/04a353e104d2f324f8ee5f4b32012618c1c86dd79e52a433b64fceed511b/pycodestyle-2.5.0-py2.py3-none-any.whl (51kB)
[K    100% |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 51kB 16.4MB/s 
Building wheels for collected packages: automl-gs, autopep8
  Building wheel for automl-gs (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache

In [19]:
!wget https://github.com/ryanleeallred/datasets/raw/master/car_regression.csv

--2019-04-11 04:03:24--  https://github.com/ryanleeallred/datasets/raw/master/car_regression.csv
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/ryanleeallred/datasets/master/car_regression.csv [following]
--2019-04-11 04:03:25--  https://raw.githubusercontent.com/ryanleeallred/datasets/master/car_regression.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 263167 (257K) [text/plain]
Saving to: â€˜car_regression.csvâ€™


2019-04-11 04:03:25 (8.69 MB/s) - â€˜car_regression.csvâ€™ saved [263167/263167]



In [20]:
!head car_regression.csv

make,price,body,mileage,engV,engType,registration,year,drive
23,15500.0,0,68,2.5,1,1,2010,1
50,20500.0,3,173,1.8,1,1,2011,2
50,35000.0,2,135,5.5,3,1,2008,2
50,17800.0,5,162,1.8,0,1,2012,0
55,16600.0,0,83,2.0,3,1,2013,1
30,6500.0,3,199,2.0,3,1,2003,0
59,10500.0,4,185,1.5,0,1,2011,0
50,21500.0,3,146,1.8,1,1,2012,2
50,22700.0,3,125,2.2,0,1,2010,2


In [21]:
from automl_gs import automl_grid_search

automl_grid_search('car_regression.csv', 'price')

Solving a regression problem, minimizing mse using tensorflow.

Modeling with field specifications:
make: numeric
body: categorical
mileage: numeric
engV: numeric
engType: categorical
registration: categorical
year: numeric
drive: categorical


HBox(children=(IntProgress(value=0), HTML(value='')))

HBox(children=(IntProgress(value=0, max=20), HTML(value='')))


Metrics:
trial_id: e6b5f502-7bea-4a98-b930-5b879d216814
epoch: 20
time_completed: 2019-04-11 04:03:59
mse: 866335309.030034
mae: 16044.953051733859
r_2: -0.4194348823774972


IndexError: ignored

Uh oh, what happened? There is an [open issue](https://github.com/minimaxir/automl-gs/issues/14) which suggests running via the command line `automl_gs` tool rather than the Python module to get better error messages for debugging.

In [22]:
!automl_gs car_regression.csv price

Solving a regression problem, minimizing mse using tensorflow.

Modeling with field specifications:
make: numeric
body: categorical
mileage: numeric
engV: numeric
engType: categorical
registration: categorical
year: numeric
drive: categorical
  0% 0/100 [00:00<?, ?trial/s]
  0% 0/20 [00:00<?, ?epoch/s][A
  5% 1/20 [00:06<02:05,  6.60s/epoch][A
 10% 2/20 [00:07<01:05,  3.65s/epoch][A
 15% 3/20 [00:07<00:45,  2.67s/epoch][A
 20% 4/20 [00:08<00:34,  2.17s/epoch][A
 25% 5/20 [00:09<00:28,  1.87s/epoch][A
 30% 6/20 [00:10<00:23,  1.67s/epoch][A
 35% 7/20 [00:10<00:19,  1.53s/epoch][A
 40% 8/20 [00:11<00:17,  1.42s/epoch][A
 45% 9/20 [00:12<00:14,  1.34s/epoch][A
 50% 10/20 [00:12<00:12,  1.28s/epoch][A
 55% 11/20 [00:13<00:10,  1.22s/epoch][A
 60% 12/20 [00:14<00:09,  1.18s/epoch][A
 65% 13/20 [00:14<00:07,  1.14s/epoch][A
 70% 14/20 [00:15<00:06,  1.11s/epoch][A
 75% 15/20 [00:16<00:05,  1.08s/epoch][A
 80% 16/20 [00:16<00:04,  1.05s/epoch][A
 85% 17/20 [00:17<00:03,  1.03

So, the real issue is in some intermediary step - let's see if we can get rid of engType.

In [23]:
automl_grid_search('car_regression.csv', 'price', col_types={
    'engType': 'ignore'
})

Solving a regression problem, minimizing mse using tensorflow.

Modeling with field specifications:
make: numeric
body: categorical
mileage: numeric
engV: numeric
engType: ignore
registration: categorical
year: numeric
drive: categorical


HBox(children=(IntProgress(value=0), HTML(value='')))

HBox(children=(IntProgress(value=0, max=20), HTML(value='')))


Metrics:
trial_id: 1c5a2168-326b-4d9d-8a40-1a2b5dffe433
epoch: 20
time_completed: 2019-04-11 04:08:00
mse: 870868554.7675673
mae: 16140.953204050678
r_2: -0.4268623149929911


Metrics:
trial_id: dcc41b38-c734-4e34-8679-0d3e1082c1e0
epoch: 20
time_completed: 2019-04-11 04:09:30
mse: 865371717.0329753
mae: 15982.412941232213
r_2: -0.4178560986447455


FileNotFoundError: ignored

It gets further, but is perhaps a bit too bleeding edge for us. Let's try [TPOT](https://github.com/EpistasisLab/tpot).

In [24]:
!pip install tpot

Collecting tpot
[?25l  Downloading https://files.pythonhosted.org/packages/95/35/a6cc358b6bb2749d6dffa1ae2427143211f3092904bbb15d1ad317ebe051/TPOT-0.9.6.tar.gz (892kB)
[K    100% |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 901kB 18.6MB/s 
Collecting deap>=1.0 (from tpot)
[?25l  Downloading https://files.pythonhosted.org/packages/af/29/e7f2ecbe02997b16a768baed076f5fc4781d7057cd5d9adf7c94027845ba/deap-1.2.2.tar.gz (936kB)
[K    100% |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 942kB 19.0MB/s 
[?25hCollecting update_checker>=0.16 (from tpot)
  Downloading https://files.pythonhosted.org/packages/17/c9/ab11855af164d03be0ff4fddd4c46a5bd44799a9ecc1770e01a669c21168/update_checker-0.16-py2.py3-none-any.whl
Collecting stopit>=1.1.1 (from tpot)
  Downloading https://files.pythonhosted.org/packages/35/58/e8bb0b0fb05baf07bbac1450c447d753da65f9701f551dca79823ce15d50/stopit-1.1.2.tar.gz
B

In [25]:
import pandas as pd
from tpot import TPOTRegressor

df = pd.read_csv('car_regression.csv')
df.head()

Unnamed: 0,make,price,body,mileage,engV,engType,registration,year,drive
0,23,15500.0,0,68,2.5,1,1,2010,1
1,50,20500.0,3,173,1.8,1,1,2011,2
2,50,35000.0,2,135,5.5,3,1,2008,2
3,50,17800.0,5,162,1.8,0,1,2012,0
4,55,16600.0,0,83,2.0,3,1,2013,1


In [26]:
df.describe()

Unnamed: 0,make,price,body,mileage,engV,engType,registration,year,drive
count,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0,8495.0
mean,46.535491,16185.453305,2.302295,141.744202,2.568337,1.650618,0.941613,2006.500883,0.575868
std,24.526251,24449.641512,1.610307,97.464062,5.387238,1.341282,0.234488,6.925907,0.741235
min,0.0,259.35,0.0,0.0,0.1,0.0,0.0,1959.0,0.0
25%,23.0,5490.0,1.0,74.0,1.6,0.0,1.0,2004.0,0.0
50%,50.0,9500.0,3.0,130.0,2.0,1.0,1.0,2008.0,0.0
75%,68.0,17145.6,3.0,197.0,2.5,3.0,1.0,2011.0,1.0
max,82.0,547800.0,5.0,999.0,99.99,3.0,1.0,2016.0,2.0


In [0]:
from sklearn.model_selection import train_test_split

X = df.drop('price', axis=1).values
X_train, X_test, y_train, y_test = train_test_split(
    X, df['price'].values, train_size=0.75, test_size=0.25)

In [30]:
%%time

tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

HBox(children=(IntProgress(value=0, description='Optimization Progress', max=120, style=ProgressStyle(descriptâ€¦

Generation 1 - Current best internal CV score: -92636917.8904812
Generation 2 - Current best internal CV score: -92636917.8904812
Generation 3 - Current best internal CV score: -87405518.88093515
Generation 4 - Current best internal CV score: -82448619.60956766
Generation 5 - Current best internal CV score: -82448619.60956766

Best pipeline: GradientBoostingRegressor(RobustScaler(input_matrix), alpha=0.9, learning_rate=0.1, loss=ls, max_depth=7, max_features=0.45, min_samples_leaf=2, min_samples_split=17, n_estimators=100, subsample=0.7000000000000001)
-95212641.6186412
CPU times: user 12min 15s, sys: 18.5 s, total: 12min 34s
Wall time: 12min 13s


In [34]:
tpot.predict(X_test)

array([ 6941.76312027,  6490.98506205,  5716.99816711, ...,
        9482.15768688,  8170.80266469, 22037.23983594])

In [35]:
y_test

array([ 7000.,  6100.,  6500., ..., 13200.,  7800., 22700.])

It works - but it looks like we're not quite out of a job yet.

## So, is AutoML an "AGI"?

**No** - it's a grid search in parameter space, with some clever type inference heuristics and a slick interface.

But, it *is* artificial, it *does* give intelligent results, and (like most technology) it *multiplies* productivity. It's not going to "take our jobs" - but it does mean that, in some situations, one data scientist will be able to do what formerly took several to achieve.

## Is Artificial General Intelligence dangerous?

![I'm working to bring about a superintelligent AI that will eternally torment everyone who failed to make fun of the Roko's Basilisk people.](https://imgs.xkcd.com/comics/ai_box_experiment.png)

There's been much philosophizing, thought experimenting, and even some genuine advocacy and policy considerations about the impact of a "true" AGI on human society. Most of these analyses essentially consider the AGI as an unfathomable deity, thinking and moving in ways well beyond human comprehension.

Consider the [paperclip maximizer](https://en.wikipedia.org/wiki/Instrumental_convergence#Paperclip_maximizer):

> Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans. â€”â€‰Nick Bostrom

This is an example of *instrumental convergence* - the idea that, if an AGI were to pursue an unbounded goal (a natural instruction like "Maximize the health of all humans") it may push it in extremely unexpected ways (put all humans in vats of goo, to both preserve them and prevent them from disabling it, since its existence is also of value to help humans).

Is this a *realistic* concern? Well, maybe eventually - but pretty obviously not an immediate one. There are many more prominent challenges involving tech and society - privacy, economic growth, equality, education - and even *if* AGI existed it's not clear how they would have the means to enact such fantastic plans. Killer robot armies make for good TV, but at some step there's likely a human with an off switch.

## Where is AI going, and where does it leave us?

![Lambda calculus? More like SHAMda calculus, amirite?](https://imgs.xkcd.com/comics/ai_research.png)

On the one hand, we live in a remarkable time. The explosion of technology from WWII to present has brought about countless innovations, greatly increased median life expectancy and GDP, and shows no sign of slowing down.

On the other hand, the more things change the more they stay the same. Humans are still Homo sapiens, with the same brains we've had for many millenia. [Dunbar's number](https://en.wikipedia.org/wiki/Dunbar's_number) stymies our attempts to be globally considerate and aware, and at the end of the day it seems like the vast majority of our behavior is as it ever has been - just with shinier toys.

So, what will happen? Will technology usher in a utopia, where automation finally relieves us all of burdensome tasks and we are free to explore science, art, and leisure? Or are we doomed to a dystopia, where increased production is also increasingly centralized and the vast majority of humanity becomes a permanent underclass in a postmodern cyberpunk world?

Probably neither - both are extreme points along a continuum of possibility. But wherever we do end up, it is all but certain that AI (that is, technology generating insights and signal) will be a key part of it.

## And what about A*G*I?

> "I think, therefore I am." -- RenÃ© Descartes

> "I am a strange loop." -- Douglas Hofstadter

Artificial General Intelligence is, as discussed, a moving target. Perhaps what we're looking for isn't intelligence, but consciousness - and specifically, consciousness *we* recognize and empathize with. Much like all parents, us humans want to foster something new in our image, and see it succeed in a way we appreciate.

It's not clear if technology will ever *really* get there. The structure and approach to artificial intelligence is inherently, well, artificial - some things like neural networks are "inspired" by biology, but still very different (far fewer connections, but far faster with more data). Perhaps computers really already *are* intelligent, just not in a way we recognize.

And if we ever do succeed at making our virtual progeny, we may find it bittersweet - not because they will inevitably destroy us (though they probably will outlast us), but simply because it will then lead us to wonder what is so special about us in the first place. If we can create an AGI from metal and sand, then are we not just mechanisms of a different sort?

# Assignment

Use either [automl-gs](https://github.com/minimaxir/automl-gs) or [TPOT](https://github.com/EpistasisLab/tpot) to solve at least two of your prior assignments, projects, or other past work (any time you fit a classification or regression model). Report the results, and compare/contrast with the results you found when you worked on it using your "human" ML approach.

Note - these tools promise a lot, but the reality is that you may have to debug a bit and figure out getting your data in a format that it recognizes. Welcome to the cutting edge - at least there's still plenty of work to do!

In [0]:
# TODO - âœ¨

# Resources and Stretch Goals

Stretch goals
- Apply AutoML to more data, including data you've not analyzed or data you're considering for project work
- Try to work with the GPU/TPU options, and see if you can accelerate your AutoML
- Check out other competing AutoML systems (see resources or search and share - many are cloud hosted which is why we went with this)
- Write a blog post summarizing your experience learning Data Science at Lambda School!

Resources
- [What to expect from AutoML software](https://epistasislab.github.io/tpot/using/#what-to-expect-from-automl-software)
- [TPOT examples](https://epistasislab.github.io/tpot/examples/)
- [Google Cloud AutoML](https://cloud.google.com/automl/) - the Google offering in the AutoML space (also has vision, video, NLP, and translation)
- [Microsoft AutoML](https://www.microsoft.com/en-us/research/project/automl/)
- [AutoML.org](https://www.automl.org)
- [Ludwig](https://uber.github.io/ludwig/) - a toolbox for deep learning that doesn't require coding, from Uber
- [USENIX Security '18-Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?](https://youtu.be/ajGX7odA87k) - a humorous but informative presentation by James Mickens, focused on security but with a consideration of data and machine learning